Jump to: navigation, search

About this board

Previous archives are at /Archive 1

By clicking "Add topic", you agree to our Terms of Use and agree to irrevocably release your text under the CC BY-SA 3.0 License and GFDL
ערן (talkcontribs)

Is scribunto really needed for working with citoid? If not, consider to remove it from the documentation here

Reply to "scribunto"

Confusing publisher and via parameters?

SMcCandlish (talkcontribs)

Someone mentioned to me that this tool is incorrectly outputting values like |publisher=Google Books, at least for en.wikipedia citation templates. If it's still doing that, it needs to be fixed ASAP to correctly use the |via= parameter for such intermediary distributors as Google Books, YouTube, Project Gutenberg, PubMed, JSTOR, etc. (if it hasn't been fixed in this regard already). I don't use VE, so I'm not even sure how to test this.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  04:55, 24 July 2016 (UTC)

Jc3s5h (talkcontribs)

Last I checked, Project Gutenberg retypes the text, they don't just scan it. So their books are new editions and they are the publisher. Citation templates don't provide any mechanism to show that an earlier edition was published by a different publisher.

SMcCandlish (talkcontribs)

Wikipedia would still not treat them as a publisher, and having tools automatically do so is misleading and wrong for our implementation of source citations. Project G. is a republisher, and that is what |via= is for, even if they did some hand cleanup of their OCR (and, yes, they do use OCR). It's no different from converting a book to PDF and then eBook format. That doesn't make you magically a new publisher, it just means you've done the work (including any after-automation cleanup) to format-shift something.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  18:50, 27 July 2016 (UTC)

Jc3s5h (talkcontribs)

I think the via parameter would imply that the republication is page-for-page and line-for-line identical to the original publication. Frequently in the past republications would be repaginated, so a passage that appeared on page 100 in the original might be on page 95 in the republication. I believe this was the case in the early days of Project G., although maybe not the more recent publications. Certainly an edition with different page numbering than the original should be treated as new editions. If in doubt, the presumption should be it is a new edition, to avoid making a false claim about what page a passage occurs in the original (which the citing editor has never seen).

SMcCandlish (talkcontribs)

Well, it doesn't imply that. Electronic versions of documents are very often not "page-for-page and line-for-line identical" to the paper version, unless painstakingly made that way, usually in PDF form. If I write a book and release in PDF form through O'Reilly by special arrangement with them, and (within our licensing parameters) you use some tool to convert it to Kindle format, and this changes the layout in some ways, you don't get to claim to be my book's publisher. Doing so would actually reduce the apparent reliability of the source, since you're just some random person, not a well-known publisher. Per en:w:WP:SAYWHEREYOUGOTIT we do want a |via= parameter identifying that this is a copy from some intermediary source and not straight from the actual publisher.

Citing specific page numbers in e-documents is generally pointless unless they are in fact exact PDF scans; we have the |at= parameter to identify where in an electronic document the material can be found. E.g., I would use this to cite the online edition of the Chicago Manual of Style by section number, since it doesn't even have page numbers. Intelligent use of |at= allows people to find the same part in a paper edition, too.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  22:31, 28 July 2016 (UTC)

Jc3s5h (talkcontribs)

SMcCandlish wrote " If I write a book and release in PDF form through O'Reilly by special arrangement with them, and (within our licensing parameters) you use some tool to convert it to Kindle format, and this changes the layout in some ways, you don't get to claim to be my book's publisher."

Yes, I do. Whatever contractual agreements got put in place among you, me, and O'Reilly allows me to. Of course, if I'm violating copyright, my version shouldn't be cited at all. Or if it happens in the year 2200 and your copyright has expired, then I don't need anyone's permission to create a new edition.

Really no different than Bloomsbury being the original Harry Potter books but Scholastic being the publisher for the English North America editions.

SMcCandlish (talkcontribs)

Taking a file and running a conversion program on it is nothing at all like Scholastic typesetting, design a new cover for, creating new frontmatter for, printing, and distributing a NAm edition of book originally by Bloomsbury. I repeat: What you are talking about is nothing but format-shifting. It is no different from you posting a piece of digital art at DeviantArt, and me (pursuant to permissive licensing terms) putting a copy of it on my Facebook feed; which entails a new copy there, and a re-encoding, i.e. a format shift, and me and Facebook distributing the work to new people. Neither I nor Facebook become the publisher; DeviantArt remains the publisher, Facebook is the |via=. I suppose a philosophical argument can be made that they are two different kinds of publishing really, but who cares? The format-shifting and additional distribution isn't "publishing" for WP citation purposes.

This distinction is the very reason that the |via= parameter was created, to stop mis-attributing format-shifted and other repostings by random pseudo-publishers and content aggregators as the |publisher=, but retain the name of the actual publisher as such, and the name of the online distributor, so that people can find the work in the original form, not just on some possibly short-lived website, but can also use that website for convenience, and not be confused about the difference. For all we know, Google Books or Project Gutenberg could disappear tomorrow forever. The distinction is especially important for any entity that both reformats and distributes (|via=) material on behalf of external, traditional publishers, and also act as the publisher itself, for new (generally amateur) content. Amazon is already doing this, and this kind of business model shift can happen at any time (e.g. HBO, Netflix, and Amazon are all publishers of original television and e-TV series, when formerly they were, respectively, a cable redistributor, a by-mail and later online stream redistributor, and an e-tailer, of previously published content. So, already, any such entity could appear as a |publisher= or a |via=, for different sources in the same article, and the distinction in each case would matter.

When it comes to historical sources, the original publisher information is also often of pertinent, even of crucial value, since significant difference can exist between the 1645 version of something from a London publisher, and a 1672 edition produced in Dublin, without any intermediary e-distributor like Project Gutenberg even being aware of it. Or – and this is telling – they often are aware of it, and so is Google Books, and take pains to note the actual publisher. Neither service claims to be the publisher of such works, and it is a weird form of original research for WP to insist that they are.

With that, I'm kind of tired of arguing round in circles on this stuff, and don't need to keep at it. We have separate parameters for these things for both a citation accuracy and utility reason (helping readers find and use sources) and a policy reason, en:w:WP:SAYWHEREYOUGOTIT, and neither the separation of these parameters nor the rationales for the separation are going to go away just because you don't see it the same way. I could even be totally wrong about every single ting I've said other than the last sentence and it wouldn't make any difference, since there's already a consensus to keep them separate, and it is not necessary for my analysis of why to be correct (though it is).  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:40, 1 August 2016 (UTC)

SMcCandlish (talkcontribs)

PS: I posted a cross-reference to this discussion, at en:w:Help talk:Citation Style 1, and it also turns out there is already an active thread about this over there:
 — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  14:23, 1 August 2016 (UTC)

David Eppstein (talkcontribs)

Perhaps the point has been missed here in the back-and-forth. Wikipedia citations use "publisher" to mean the original publisher of an edition of a work. Some of our information providers use the same keyword for a different meaning, the most recent content provider. We should not mix up these two meanings merely because they use the same keyword. If information providers are using "publisher" to mean something different than what we want it to mean, Citoid should not be blindly copying them.

SMcCandlish (talkcontribs)

Agreed, entirely. If the most recent content provider isn't the real publisher of the content, the former should be in the |via parameter. I don't know if there's a practical way to make Citoid aware of a big list of journal aggregators, news aggregators, book scanning sites, etc., to code them as |via instead of |publisher, but I hope so. If WP can maintain a URL blacklist that includes virtually all known URL redirectors (, etc.), I would think that it could maintain a list of content aggregators (pseudo-republishers).

Jc3s5h (talkcontribs)

It isn't quite as simple as original publisher vs. republisher. A republisher that simply copys images of the original publication and makes them available online should probably be named with the via parameter, or similar. But a publisher who re-typesets, and perhaps repaginates an older work should be regarded as a full-fledged publisher. Some citation styles call for naming the original publisher in this situation, but the Wikipedia citation templates do not have a parameter for this purpose.

SMcCandlish (talkcontribs)

We already covered this above; one of the hazards of "necroposting" on a year-old thread. WP cites sources to help readers identify and find them and to help editors verify our content. We do not do so as a bibliographic database service; the purpose is not to track the history of a work. So, WP has no need of being able to identify a previous publisher's details. If you have a genuinely republished version with new typesetting and pagination, or even just a new foreword/introduction, this is the work you are citing, by that particular publisher. We don't care who published the first edition that had different font, page numbers, or lack of a "50th anniversary" foreword or whatever. It's just not relevant.

[Conceptual aside: It's really no different from a quote being in a New York Times article, perhaps with an "[editorial tweak]" in it, and a reporter's introduction ("According to X. Y. Zounds in The Zounds Method,"). We cite the newspaper article we found the quote in, not the original primary source of the statement (unless we also have that, and have checked it, and it's appropriate to "double-up" the citation for some reason, e.g. because another source misquoted it and caused a controversy). A new edition, a real republication, of a work is a similar matter; the original material being included is essentially a giant quotation, may have been editorially altered in the course of republication, and may have new lead-in material, a big "Foreword" or "Introduction to the Nth Edition" version of a journalist prefacing a quoted statement from a speech or document.]

By contrast, |via is important, for actual WP purposes and in addition to |publisher, to use for cases of pseudo-republishing, i.e. redistribution or format-shifting, such as if you got something via a scanning site or a content aggregator:

  1. That intermediary is incidental and has no effect we care about on the content itself (e.g., we DGaF if it has an aggregator's watermark on it; that isn't substantive and does not constitute an "edition" or a new "publishing" for WP purposes).
  2. The URL or the entire aggregator itself might not be there tomorrow. I have no insider info on the budgets of Project Gutenberg, Internet Archive, Google Books, or the journal aggregators, but these things cost money to operate. We do know that at least the first two of these have had funding struggles in the past, and still publicly seek donations to keep them going. The latter two are things a profit-minded business entity could axe at any moment, or start paywalling, as a simple business decision. The only consequence of such a failure is a dead URL. The actual citation is to the original work and remains valid; the work still exists and can be found. The dead link info is removed from the citation; we do not remove from citations the names of actual publishers who have ceased operation.
  3. It may not be the most convenient or effective way for a particular reader to get the work.Examples: if someone has taken a print-out of the WP article to a public library and all its Internet access kiosks, if there are any, are in use, but the library may have the original work on its shelves; or in a place where Internet access is costly and schlepping down a huge PDF is not practical, but looking at a paper copy you got via inter-library loan is free; or when a journal aggregator is not free for full text, with that only accessible for pay or at institutions with a subscription; and ... insert numerous other scenarios.

[Second conceptual aside: If I have a blog that I publish, and someone cites it, and the site goes down permanently, and it wasn't archived by Wayback machine or something equivalent, then that site is gone; i.e., it cannot be used by readers/editor for verification, ergo it is no longer a valid source citation. A conduit for a copy of a publication (e.g., and the publication itself ( or whatever): a big and clear difference. People seem to have unreasonable difficulty with the distinction, just "because Internet", i.e. because "a website is a website" in many minds; they're confusing the medium for the message, the delivery format for the content.]

Reply to "Confusing publisher and via parameters?"

List of translators available in Citoid

Lsanabria (talkcontribs)

The only URL with a list of translators that I could find was updated almost a year ago. Is there any way to find out what translators are avilable in Wikimedia?

Danmichaelo (talkcontribs)

Ping Mvolz. I'm also curious about how often the translators are updated from since I just submitted my first translator there :)

Mvolz (WMF) (talkcontribs)

Very intermittently. Once your translator gets merged feel free to ping me somewhere and I'll get it merged upstream :).

Danmichaelo (talkcontribs)

Thanks, Mvolz, it was merged now :)

Mvolz (WMF) (talkcontribs) is the list of our translators, unfortunately these also include translators which don't work with citoid (ones which don't have the 'v' under browser support flag i.e. this one works and this one doesn't.) There used to be tests run daily and the results online, which better showed you which ones work and which ones don't, but those have been broken for awhile now.

Czar (talkcontribs)

It would be great to have that translator results tool back up now that Refill is using Citoid. Even a quick and dirty indicator would be good. Otherwise it's harder to diagnose why links aren't expanding properly.

Mvolz (WMF) (talkcontribs)

Tracking in phabricator here:

Czar (talkcontribs)

For posterity, here's the link that tests the Citoid server for site/translator compatibility:

Reply to "List of translators available in Citoid"

API shows the word "pages" only and I can't see any template fields

Atef81 (talkcontribs)

I have followed all the steps in the documentation over a week ago, and until now the template items don't show in the visual editor citation module.

I followed the steps in this section here: Citoid#Empty_references_appear, but all what appears in the api is this:

    "pages": {}

the template definitely has data in the :

"maps": {

    "citoid": { 

and also has the correct templatedata codes.

I executed the jobqueue several times to make sure there are no pending jobs, and null-edited the templates several times.

Any suggestions are welcome, please. Thanks in advance.

Whatamidoing (WMF) (talkcontribs)

Which wiki are you trying to fix?

Atef81 (talkcontribs)

Sorry I missed this message. I am trying to fix a Mediawiki installation I have. it is not part of the Wikimedia project. is this what you are asking or I missed the question? thanks.

Whatamidoing (WMF) (talkcontribs)

That was my question. I don't know how to fix your problem, but if it was a WMF wiki, then I could try to find someone who might be able to fix it for you.

You've already tried all the things I would recommend to you. @Mobrovac-WMF, if you see this ping, do you have any advice?

Reply to "API shows the word "pages" only and I can't see any template fields"

Citoid: several publishers recently missing?

Albert Ke (talkcontribs)


I think Citoid is the most fantastic service for mediawiki and I started using it about a year ago to get automatically information on many publications, based on the DOI. Really great work!

Not too long ago, Citoid switched to a new API endpoint and I was able to connect to that as well but I noticed that e.g. the Wiley publisher is suddenly not supported anymore (but used to be supported in the past). Here are some examples of articles of which data can no longer be retrieved anymore:

Wiley publishers:

AGU publications (part of Wiley):

However, the service works great for e.g.:

Would the Wiley publisher become available again through the API in the foreseeable future?

Thank you!,


I have posted this on phabricator as well (

Mvolz (WMF) (talkcontribs)

Hi Albert,

We have historically had trouble with scraping Wiley - something to do with their redirect system puts us into a redirect loop and can cause timeouts. But the DOI SHOULD work (since then we have the DOI to work with to query crossref even if we can't access the website) so that sounds like a bug. I will look into it further.

Thanks for reporting this!

Albert Ke (talkcontribs)

Appreciated Mvolz,

Hope it is something easy to solve!


Whatamidoing (WMF) (talkcontribs)

Update: It looks like this is (mostly?) a problem involving RESTBase. I don't know when it will get fixed, but they're working on it.

Reply to "Citoid: several publishers recently missing?"
Ocaasi (WMF) (talkcontribs)

Hi all, in the Wikipedia Library program at the Wikimedia Foundation we have been working with OCLC to make autofilled ISBN citations available, through using their WorldCat database. We have deployed the feature on all language Wikipedias: you can learn more about it on the Wikimedia blog:

Cheers, Jake

PKM (talkcontribs)

I've had problems with the publisher being entered in the author fields using the ISBN lookup. Is this a known problem, and do you want examples?

Ocaasi (WMF) (talkcontribs)

Not a known problem. Examples please! Even better if you file in Phabricator here:

PKM (talkcontribs)

Will do. Both Museum publications,

PKM (talkcontribs)

and done. It's consistent for books published by musuems (three out of three tries). I just love edge cases. - Paula

Merrilee (talkcontribs)

Should be added to the main description? Thanks Jake!

Reply to "ISBNs citations now with autofill!"

Bugs: pmc, redundant urls, and ref tags

Boghog (talkcontribs)

Not sure where the best place to post this. But here goes:

I never use this tool myself, but I frequently am cleaning up after others that do. Three problems that I have noticed:

(1) in the PMC parameter, the value should be a integer and not prefixed with "PMC":

incorrect:|pmc=PMCxxxxxx (where xxxxx = integer)

correct: |pmc=xxxxx

The incorrect form throws an error that must be manually fixed.

(2) urls are sometimes added that are indentical to that already produced by |doi=, |pmid=, or |pmc=. In this case, the url should be suppressed since it is redundant.

(3) Ref tags that take the form of ":0", ":1", ":2". While they are unique, they are not very informative. Better to create a Harvard style ref tag, the form of first authors last name + year of publication (i.e., "Smith_2017" is much more readable that ":0"). (talkcontribs)

A bit of googling uncovered these:

1) &


At least for 1 it seems more like a case where the wikipedias don't want to follow the recommended (or required) referencing style of the publishers.

Boghog (talkcontribs)

Thanks for the links, however is important to distinguish between a template imbedded in wiki markup and how that template is rendered. The NIH recommendations only concern how the citations are rendered, not how they are entered in a {{cite journal}} template. The NIH has no recommendation on the syntax of templates.

The {{cite journal}} template renders the citation very close to the NIH recommendations (PMCxxxxx). The only difference is that {{cite journal}} adds a space between PMC and xxxxx. PMC in turn contains a wikilink to a Wikipedia article that explains what PMC stands for.

It is completely unnecessary prepend the parameter value with "PMC". This is redundant. The template itself produces the correct rendering. Hence the problem is with Citoid, not {{cite journal}}. (talkcontribs)

Well, that's a double edged sword. It is true that template markup differs from rendering, that means that it doesn't make any difference to a template if it receives a pmc = pmc XXXX or pmc = XXXX since it can be removed by parser functions or lua. So it comes down to aesthetics or personal preferences of some users because the template can be changed to accept both if there's consensus to do so.

The WMF developer's point in the phabricator task seems to be the same as the one in the publisher's site. The id is "PMC XXXXX", and the site also recommends "PMCID : PMC XXXX". Pages like Cancer AND Cholera don't seem to follow that recommendation anyway. While wikimedia users may prefer to use only integers to identify it, citoid and many wikimedia tools are also used by third parties, and such exemptions may not be wanted by them. There's also no guarantee that all wikipedias use the same format as english wikipedia, so it is possible that other wikis were doing the exact opposite, e.g. adding PMC where it only had an integer. The only alternatives are to change the code only for the wikis that prefer it that way, or leave it as is.

They stripped it previously ( without doing the research to understand the "official" preferred value, this time they decided against it after doing the research.

Anyway, I'm not WMF developer, nor associated with wikimedia, you'll have to convince them :).

Boghog (talkcontribs)

Can you cite a single example of another Wikipedia including "PMC" in the parameter value? Most other Wikipedias don't support pmc to begin with, and the ones that do have generally have followed the English Wikipedia's lead.

Citation Style 1 templates were created before Citoid. Hence it is reasonable that Citoid be compatible with Citation Style 1 templates, not the other way around. Furthermore, none of the other citation generation tools include "PMC" in the parameter value. I suppose that Citation Style 1 templates could be modified to optionally accept PMC in the parameter value, but this unnecessarily clutters citation templates with redundant characters. The parameter name is pmc, why does the parameter value also need to include pmc?

I have reopened the case on wikimedia. (talkcontribs)
Can you cite a single example of another Wikipedia including "PMC" in the parameter value?

You sparked my curiosity, and here's one example:

Citation Style 1 templates were created before Citoid

True, but that would be over fitting. There are more than a hundred encyclopedias, and the tool should be neutral and not cater to a single one. That's what it does, it returns the id as requested. Stripping it means that it just returns a number.

The parameter name is pmc, why does the parameter value also need to include pmc?

Knowing the full PMC means that users can quickly verify if citoid actually gave them the right data, instead of guessing. This is particularly important because a number can mean just about any random thing (e.g. a PMID number instead of a PMC). Just because it adds it as a parameter doesn't mean it is necessarily correct.

Boghog (talkcontribs)

In the Greek Wikipedia, using a pmc parameter value where PMC is prepended is optional. However the rendering with when PMC is included in the parameter value looks strange (PMC is displayed twice and PMCID is not displayed). This should be fixed.

The whole purpose of Citoid is to automate citation generation process so the editor doesn't have to worry about the accuracy of the parameter values. Since the data is downloaded from PubMed, the parameter values with a high degree of confidence are correct.

Numbers are not any less random if PMC is prepended to the parameter value. The best test to verify that the the parameter value is correct is to follow the rendered PMC link.

Whatamidoing (WMF) (talkcontribs)

Boghog, I tend to think that the IP is correct: it would be good for {{cite journal}} to accept the "formally correct" id number. This has been discussed for a couple of months now, and I've not yet heard any technical reason for the template to choke when it's given the "official" id number. I don't see a need to require the official id number, but it should be able to cope with it.

Reply to "Bugs: pmc, redundant urls, and ref tags"
ArgonSim (talkcontribs)

Is there a way to change how Citoid locally handles date formatting when you automatically add a citation from an URL? By now it adds ISO 8601 dates (YY-MM-DD), but the Portuguese Manual of Style recommends that dates should follow "dia de mês de ano" (16 de dezembro de 2016).

Elitre (WMF) (talkcontribs)

You could change the template logic as suggested by Citoid/Enabling_Citoid_on_your_wiki#Access_date_is_formatted_differently_on_my_wiki. Right now I can't find a task or example with actual instructions about that, sorry.

Whatamidoing (WMF) (talkcontribs)

w:en:Module:Citation/CS1 at the English Wikipedia supports this. If you've got current versions of these templates, then use |df= to set your preferred date format. You can adjust the templates to do this automatically/in all cases except when locally over-ridden.

ArgonSim (talkcontribs)

The current pt-version does seem to support |df=, but I'm afraid if I try to correct it myself, I'll end up doing it wrong and breaking everything.

Whatamidoing (WMF) (talkcontribs)

I wonder if User:He7d3r or User:Dbastro would like to look into this. Since the code exists, it'd probably be good for the Portuguese Wikipedia to make the most desirable format be the default anyway.

Reply to "Date formatting"
Elitre (WMF) (talkcontribs)

For those interested in this topic, please weigh in at . Thanks!

Reply to "Handling language codes"

Different cite template behaviour of websites

Tibbe Tibbe (talkcontribs)

Different websites gets different citation templates from Citoid. E.g. gets Template:Cite news and gets Template:Cite web

In enwp all the templates from the cite* family are somewhat similar, but in German Wikipedia there is a huge difference between the template for online sources and for offline sources. Unfortunately "Cite news" gets translated to the template for offline sources. (More details at my German userpage)

May you please tell me, where Citoid determines the "target template"?

Elitre (WMF) (talkcontribs)

You probably want to seeückmeldungen/Archiv/1#Welt_Online .

Mvolz (WMF) (talkcontribs)

We use the Zotero itemType (full list of types here: and then map these to templates.

If there is a translator in Zotero, it can often correctly detect that a website is a news site. However, if there is no support for it in Zotero, we don't know that it's a news site from just the metadata, and as a fallback it is a website.

You can see what the "type" is for any given link in citoid by going to and putting the link in. The Zotero type is in the field "itemType".

Zotero itemTypes are further mapped to Templates in this message:

If you want to know if there is a translator for a particular newspaper in Zotero, it may be helpful to look for tests here: (we only get results from the enabled ones)

( has the most updated tests)

Related bugs on phabricator: (Poor support for non english newspapers)

Reply to "Different cite template behaviour of websites"