Jump to navigation Jump to search

About this board

Previous archives are at /Archive 1

Potential webinar on Zotero translator coding

Summary by Czar

A tech talk was scheduled for February 2016

Czar (talkcontribs)

Behind Citoid and likely any of our future citation parsing services is are a series of Zotero "translators". I recently wrote one and it had a bit of a learning curve. However, Sebastian from the Zotero project has offered to give a webinar on translator coding if there is interest. It would include basic scraping by manual code and the streamlined "Framework" (less coding). I'm trying to gauge interest, so please leave a note on my enwp talk page with the nature of your interest (e.g., are you just curious or do you plan to write translators for specific sites?) Please share with other language Wikipedias and let me know?

Elitre (WMF) (talkcontribs)

Hey @Czar. My colleagues and I were wondering whether Sebastian could be interested in turning the "webinar" into a Tech talk? We could make it happen around February or March, if so. Thanks for your help!

This post was hidden by Czar (history)
Czar (talkcontribs)

Here is Sebastian's webpage and Twitter (not sure if the Northwestern address still works). Let's try Twitter

Czar (talkcontribs)

I confirmed that Sebastian's email in the linked CV is the right one, and he said Feb/March is fine

Elitre (WMF) (talkcontribs)


Elitre (WMF) (talkcontribs)

Learn how to write a translator! A related Tech Talk is happening on Feb 29th. Join us!

Tinss (talkcontribs)

When setting $wgCitoidServiceUrl to!/Citation/getCitation, I get an XMLHttpRequest error in my browser when adding a citation and it seems citoid is querying this url with the old API. I'm using REL1_31 of the Citoid extension. Citoid/API, indicates that the API has changed. Is this change only available through restbase and not $wgCitoidServiceUrl?

We run a small wiki and figured we'd piggy back off Wikipedia's citoid service instead of hosting our own.

Thanks for any help!

Mvolz (WMF) (talkcontribs)

Yes, the "old" API is actually citoid's native API, and that's what wgCitoidServiceURL expects, but we deprecated the public version of that a few years back, and now we're only running the one behind restbase. If mw.config.get( 'wgVisualEditorConfig' ).fullRestbaseUrl exists, then it Citoid uses the restbase one, but I'm pretty sure if you set that in the VE config it will mess up the rest of your VE installation :).

Feel free to file a bug and/or submit a PR:

In the meantime you could also just manually set fullRestbaseUrl here instead of getting it from the VE config:

Tinss (talkcontribs)

@Mvolz (WMF), I've managed to deploy restbase on my wiki and was wondering how to forward requests for /data/citation/{format}/{query} to{format}/{query}?

I've also noticed you've patched Citoid to have the option to make use of a public API behind restbase. Until that codes makes it past review, I'll try to use the aforementioned way.

Thanks a lot!

Mvolz (WMF) (talkcontribs)

fr and en wikipedia APIs perform identically, so there's probably no particular need to do that; internationalisation is not great, but what it does do is pass on the accept-language header. The content of the accept-language header is mw.config.get( 'wgContentLanguage' ) so as long as the content language of your wiki is fr, it'll get French when available.

Tinss (talkcontribs)

Thanks for the info! Any idea how I'm supposed to configure my local restbase to forward requests to another remote API?

Mvolz (WMF) (talkcontribs)

The change is merged, so you don't need to configure restbase, you should just be able to configure the citoid extension (if you update to 1.0.0): Citoid#Citoid extension

Nicolas NALLET (talkcontribs)
Reply to "Use of public API"

Bugs: pmc, redundant urls, and ref tags

Boghog (talkcontribs)

Not sure where the best place to post this. But here goes:

I never use this tool myself, but I frequently am cleaning up after others that do. Three problems that I have noticed:

(1) in the PMC parameter, the value should be a integer and not prefixed with "PMC":

incorrect:|pmc=PMCxxxxxx (where xxxxx = integer)

correct: |pmc=xxxxx

The incorrect form throws an error that must be manually fixed.

(2) urls are sometimes added that are indentical to that already produced by |doi=, |pmid=, or |pmc=. In this case, the url should be suppressed since it is redundant.

(3) Ref tags that take the form of ":0", ":1", ":2". While they are unique, they are not very informative. Better to create a Harvard style ref tag, the form of first authors last name + year of publication (i.e., "Smith_2017" is much more readable than ":0"). (talkcontribs)
Boghog (talkcontribs)

Thanks for the links, however is important to distinguish between a template imbedded in wiki markup and how that template is rendered. The NIH recommendations only concern how the citations are rendered, not how they are entered in a {{cite journal}} template. The NIH has no recommendation on the syntax of templates.

The {{cite journal}} template renders the citation very close to the NIH recommendations (PMCxxxxx). The only difference is that {{cite journal}} adds a space between PMC and xxxxx. PMC in turn contains a wikilink to a Wikipedia article that explains what PMC stands for.

It is completely unnecessary prepend the parameter value with "PMC". This is redundant. The template itself produces the correct rendering. Hence the problem is with Citoid, not {{cite journal}}. (talkcontribs)

Well, that's a double edged sword. It is true that template markup differs from rendering, that means that it doesn't make any difference to a template if it receives a pmc = pmc XXXX or pmc = XXXX since it can be removed by parser functions or lua. So it comes down to aesthetics or personal preferences of some users because the template can be changed to accept both if there's consensus to do so.

The WMF developer's point in the phabricator task seems to be the same as the one in the publisher's site. The id is "PMC XXXXX", and the site also recommends "PMCID : PMC XXXX". Pages like Cancer AND Cholera don't seem to follow that recommendation anyway. While wikimedia users may prefer to use only integers to identify it, citoid and many wikimedia tools are also used by third parties, and such exemptions may not be wanted by them. There's also no guarantee that all wikipedias use the same format as english wikipedia, so it is possible that other wikis were doing the exact opposite, e.g. adding PMC where it only had an integer. The only alternatives are to change the code only for the wikis that prefer it that way, or leave it as is.

They stripped it previously ( without doing the research to understand the "official" preferred value, this time they decided against it after doing the research.

Anyway, I'm not WMF developer, nor associated with wikimedia, you'll have to convince them :).

Boghog (talkcontribs)

Can you cite a single example of another Wikipedia including "PMC" in the parameter value? Most other Wikipedias don't support pmc to begin with, and the ones that do have generally have followed the English Wikipedia's lead.

Citation Style 1 templates were created before Citoid. Hence it is reasonable that Citoid be compatible with Citation Style 1 templates, not the other way around. Furthermore, none of the other citation generation tools include "PMC" in the parameter value. I suppose that Citation Style 1 templates could be modified to optionally accept PMC in the parameter value, but this unnecessarily clutters citation templates with redundant characters. The parameter name is pmc, why does the parameter value also need to include pmc?

I have reopened the case on wikimedia. (talkcontribs)
Can you cite a single example of another Wikipedia including "PMC" in the parameter value?

You sparked my curiosity, and here's one example:

Citation Style 1 templates were created before Citoid

True, but that would be over fitting. There are more than a hundred encyclopedias, and the tool should be neutral and not cater to a single one. That's what it does, it returns the id as requested. Stripping it means that it just returns a number.

The parameter name is pmc, why does the parameter value also need to include pmc?

Knowing the full PMC means that users can quickly verify if citoid actually gave them the right data, instead of guessing. This is particularly important because a number can mean just about any random thing (e.g. a PMID number instead of a PMC). Just because it adds it as a parameter doesn't mean it is necessarily correct.

Boghog (talkcontribs)

In the Greek Wikipedia, using a pmc parameter value where PMC is prepended is optional. However the rendering with when PMC is included in the parameter value looks strange (PMC is displayed twice and PMCID is not displayed). This should be fixed.

The whole purpose of Citoid is to automate citation generation process so the editor doesn't have to worry about the accuracy of the parameter values. Since the data is downloaded from PubMed, the parameter values with a high degree of confidence are correct.

Numbers are not any less random if PMC is prepended to the parameter value. The best test to verify that the the parameter value is correct is to follow the rendered PMC link.

Whatamidoing (WMF) (talkcontribs)

Boghog, I tend to think that the IP is correct: it would be good for {{cite journal}} to accept the "formally correct" id number. This has been discussed for a couple of months now, and I've not yet heard any technical reason for the template to choke when it's given the "official" id number. I don't see a need to require the official id number, but it should be able to cope with it.

Whatamidoing (WMF) (talkcontribs)
Guarapiranga (talkcontribs)

Two years on, and I'm still getting this error… The automated citation bot returns PMC=PMCPMCnnnnn (where n is a digit), and Wikipedia—correctly—identifies it as an error.

It's also generating dates that are wrong, according to Wikipedia own's style manual.

Boghog (talkcontribs)

The pmc=PMCPMCxxxxx is a new bug. I guess if two PMCs is good, three must be better ;-) I still do not understand the rationale for prefixing the numeric PMC value in citation templates. The pmc parameter name makes it clear what it is. In any case the redundant PMCs are being systematically removed from the English Wikipedia by gnomes and bots.

Whatamidoing (WMF) (talkcontribs)

The (obviously wrong) double PMC problem has been fixed.

In case this comes up again, the point behind "prefixing the numeric PMC value" was that PubMed said that the complete identifier includes the "PMC" code. The English Wikipedia has historically used a truncated representation of the full id code. This is the equivalent of someone saying that their "correct" telephone number includes the local number but excludes the country and city/area codes. It's functional in some contexts, but it's not complete or unambiguous.

Boghog (talkcontribs)

Why is the context outside of Wikipedia even relevant? Will someone ever harvest the raw cite journal template to use outside of Wikipedia? I doubt it. Even if they did, it would be a trivial matter to do a search and replace of "pmc=xxxxx" with "PMCxxxxx".

Within Wikipeida, the template, the pmc parameter name, and the rendered citation all make crystal clear the context. Why do we need to specify pmc=PMCxxxxx? The second PMC is completely redundant and unnecessary. pmc=xxxxx is clear enough. The redundant PMC prefix is being treated as a maintenance issue (see and is systematically being removed by bots and gnomes.

In addition, PubMed itself is inconsistent. The PMC ID is prefixed by PMC, but the PMID ID is not prefixed with PMID? Why is that?

Finally It is true that NIH grant applications must specify PMCID: PMCxxxxxx (see Based on that logic, then we would need to replace pmc=PMCxxxxx with pmc=PMCID: PMCxxxxx. Right? But how often does an NIH grant applicant resort to copying a Wikipedia citation template for one of their own publications? I bet that it has never ever happened.

Whatamidoing (WMF) (talkcontribs)

I think that if you asked people familiar with the English Wikipedia's core community, then you'd likely find that they're (we're) generally perceived to be fanatical about getting the facts as absolutely correct as humanly possible. Consequently, it feels strange to me that these same people seem to shrug their shoulders and say "Yup, there exists a canonical form for that identifier, but I'd rather do it my way than get it right".

The fact that a bot is "un-correcting" the canonical form (when the template could be changed to accept both forms) is particularly strange. Can you think of any other instance in which an identifier or similar objective fact is deliberately rendered in the "wrong" form? CheBI, ChEMBL, DTXIDs all get their prefixes.  Why not this one?

Boghog (talkcontribs)

One needs to distinguish between data stored in a template and how it is rendered. This discussion is about the former.

In both {{infobox chemical}} and {{infobox drug}}, the CheBI parameter only accepts an integer value without a prefix. If one prepends the integer accession number with "CheBI", it generates a link that triplicates the CheBI prefix (e.g., CheBI: CheBiCheBixxxx) and the rendered external link is non functional. Ditto for ChEMBL. I am not sure what DTXID is.

Therefore, the cite journal pmc parameter is similar to the infobox chemical/drug parameters CheBI and ChEMBL except the pmc parameter will accept the pmc prefix, strip the (redundant) prefix from the rendered citation, and generate a functional link, but in addition, will flag it as a maintenance category.

The pmc value is stored in a parameter that is called pmc. So why do we need to repeat pmc in the parameter value?

Boghog (talkcontribs)

One additional item. The NCBI eutils search engine (NCBI administers PubMed Central) allows searches without the PMC prefix:

and also returns PMC accession numbers without the prefix. The output of the above URL reads in part:

<article-id pub-id-type="pmc">212403</article-id>

This suggests that the NCBI stores PMC accession numbers in their internal databases without the prefix and adds the prefix when rendered in PubMed pages. {{cite journal}} templates are analogous to a database that does not store the prefix in the pmc parameter value, but does display the prefix when the template is rendered in a Wikipedia article.

The NIH regulations do not specify how pmc accession numbers should be stored in databases. They only specify how they should be displayed in a written grant application.

Reply to "Bugs: pmc, redundant urls, and ref tags"

Please, return more data: PMC, PMID, ISSN, Publisher

Summary by Arthurfragoso

There was a communication problem in the main WP servers with the NIH servers.

The SysOp team fixed it. :)

Arthurfragoso (talkcontribs)

As I have asked in,_PMC,_ISSN_and_publisher.

If I do a DOI search, it returns me the articles' title, authors etc. But it won't return PMID, PMC, ISSN, publisher and other data that I can easily get manually.

It's possible to get PMID and PMC from the DOI using this API:

Google Scholar always return me the ISSN and Publisher, although they don't have an API, but probably there are others that do.

AManWithNoPlan (talkcontribs)

Citoid only returns primary data. It’s doesn’t do the next step of asking PubMed do you recognize this doi. ~~~~

Arthurfragoso (talkcontribs)

Ok, now I understand. Who retrieves the data is Zetero. I installed it and did some tests:

If I do a DOI search, it returns me fewer data:

If I do an URL search, it returns me DOI, PMID, PMC, etc:

I tried to install citoid, but it failed to build, so I tested the wikipedia server:

DOI search:

URL search:

The URL returns me unknown_error, that's probably why I don't use it. :(


Arthurfragoso (talkcontribs)
Mvolz (WMF) (talkcontribs)

Thanks for reporting, that nothing is getting through from the pubmed website at all is separate issue from the one above! And a much more serious problem, unfortunately.

Mvolz (WMF) (talkcontribs)

So, we actually added support for this using the pubmed api in 2014 ( Unfortunately, the NIH api has a long, long history of falling over and causing citoid performance issues ( As a result we added a config variable and turned off requesting extra identifiers from their service in production in 2017 ( Since citoid has to be snappy enough to work in real time on user request, I don't see us changing that unless a more reliable / faster service could be found to supply the info.

My advice would be for a bot to do this work, because that can work in the background and therefore response time isn't as critical.

Arthurfragoso (talkcontribs)

It's now working! Cheers! Yay! :)

Sebastian Berlin (WMSE) (talkcontribs)

The preamble mentions statistics for usage of Citoid. Where could I find this? I'm interested in usage on SVWP after we added translators for a few Swedish sites last fall.

Mvolz (WMF) (talkcontribs)
Whatamidoing (WMF) (talkcontribs)
Mvolz (WMF) (talkcontribs)

Nope. That seems like a bug.

I know there have definitely been isbn inputs because I've submitted a few myself!

Mvolz (WMF) (talkcontribs)
Whatamidoing (WMF) (talkcontribs)

I wish that dashboard did some curve smoothing.

Are we any closer to being able to say that citoid is used some thousands of times per day?

Reply to "Statistics"

How can it work in private wiki in HTTPS

VincentNo15 (talkcontribs)

Unfortunately I'm not professional programmer or developer. So first of all, I apologize for my humble explation of my situation.

I want establish my own private wiki with SSL certificate in AWS EC2 instance.

After the setting of citoid server and zotero server, I passed the trial of citoid server/zotero server.

Even though the output of PMID search in web browser URL also showed normal response.


But, In my private wiki based on https proxy, citoid is not working in visual editor with message of "We couldn't make a citation for you. You can create one manually using the "Manual" tab above.". In the chrome debugging, they said like below. (I'm sorry to mask the domain name because of privacy. I re-checked the spell of domain name in configuration and there was no spell error.)

  • Mixed Content: The page at (deleted to avoid abusefilter) was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint (deleted to avoid abusefilter). This request has been blocked; the content must be served over HTTPS.

I'm just curious about possibility of success. Is it the problem of cross site scripting? Then how can I do the cross site scripting. (I tried the $wgCrossSiteAJAXdomains configuration but it was not working)

Or do I have to customize citoid server to attach the SSL? If the cross-site scripting can cause the crucial security problem, then I'm willing to customize the citoid server. But I have no expertise about that. Please help me!!!

Mvolz (WMF) (talkcontribs) (talkcontribs)

Sorry for late response.

I've already changed that part but in debugging mode, error message was printed.

"CORS request did not succeed"

Even though I'm not the professional developer, I think that I may have to change some configuration of Citoid server to passing the https proxy.

Can you help me to solve this problem? :(

VincentNo15 (talkcontribs)

└ I forgot to login. I'm the same person who ask help.

How do you think about using stunnel to get through?

Reply to "How can it work in private wiki in HTTPS"
GreenC (talkcontribs)

How can we get User:Citation bot whitelisted it does high volume and exceeds policy -- they have their own Citoid install but it is not as good as the Wikipedia install which gives better results.

AManWithNoPlan (talkcontribs)

To reduce load we do no query urls for citations that are already "complete". Since historically we are used in the sciences, we do not consider a citation complete without a volume number. We have added a short list of popular websites ( and such, it if purely my own writing based upon a few wiki pages) that do not have volumes and such to flag more as "complete". If we switch to Citoid, then it would be nice to get some feedback on what URLs websites are the most common, so we can add more to that last.

Mvolz (WMF) (talkcontribs)
Mobrovac-WMF (talkcontribs)

Hm, there are no limits on the citation API end points AFAIK. Could you provide a sample response?

AManWithNoPlan (talkcontribs) Citoid blocks us, so we run our own on the tool server.

AManWithNoPlan (talkcontribs)
Mvolz (WMF) (talkcontribs)
AManWithNoPlan (talkcontribs)

We got blocked so we now use our own on the tool servers. It does seem to time out at times also.

Mvolz (talkcontribs)

So I've asked operations, and we don't block IPs. However we do have rate limits of 1000 per 10 seconds (100 requests per second) which applies to all the mediawiki and restbase APIs. If you are receiving a 429 response, you just need to make sure you are adding a timeout in between requests so as not to exceed the limit.

Is 429 the response code you are getting?

(Also, it does take really long sometimes, depending on how long the time out is in your request package it may exceed it, so you may want to set a longer time out on your end.)

AManWithNoPlan (talkcontribs)

I will investigate. If it is 429, then we can sleep a little while and try again. The bot can be being run by lots of people so, we might hit the limit. How long of a time-out are we talking about?

Mvolz (WMF) (talkcontribs)

The docs say "1000/10s (100/s long term, with 1000 burst)" - so I think if you exceed 1000 and wait 10s it should reset, but this is just from reading the docs.

For timeout I tried to find an outside value for you; in tests we allow a request to take to take up to 40 seconds, but there have even been cases of something taking 75 seconds to return in the wild:

And the caching layer sets a timeout of 360 seconds, so you will not get any responses that take longer than that.

Reply to "citation bot"

We couldn't make a citation for you.

Timmy87 (talkcontribs)

Hello. I have a problem with citoid. After I inserted url, doi, pmid,the error message is shown like this: "We couldn't make a citation for you. You can create one manually using the "Manual" tab above."

I think the "zotero transalation server" is functioning because 'c-url test' or 'npm test' result had no problem. And I had the problem in following citoid instellation manual. I cannot clone not REL1-29 but also REL1_31. The error message on my shell is "Remote branch REL1_31 not found in upstream origin.

If you want to do an anonymous checkout:

git clone

Like VisualEditor, the master branch requires alpha builds of MediaWiki. If you're installing on an other mediawiki version, use the right branch like git clone -b REL1_29

I cannot find REL1_31 services/citoid

Thanks in advance.

ps. I'm not programmer, just i followed mediawiki manual. So my explanation of error is lack of information.

MediaWiki Certified by Bitnami 1.31.1-2 on Ubuntu 16.04

Timmy87 (talkcontribs)

I checked chrome debugging, I found error message like this :

"OPTIONS http://localhost:1970/api?action=query&format=mediawiki&search=123412345 net::ERR_CONNECTION_REFUSED"

$ curl -X GET --header 'Accepapplication/json; charset=utf-8' 'http://localhost:1970/api?action=query&'

But, this process is working. What's the problem...

Mvolz (talkcontribs)
Timmy87 (talkcontribs)


Thanks for answer! But I have followed those ways.

  1. Citoid#Configure Citoid on a Citoid-enabled wiki
  2. Citoid#Get "could not make a citation for you" every time

But, It doesn't work.

So I followed citoid again. But, I couldn't find the right branch of sevices/citoid (Not Extensions/citoid)

I think my problem is I coudn't clone REL1_31. Could you explain how can I get REL1_31 sevices/citoid?

(Although I changed following sentence 'REL1_29' to 'REL1_31', I see the error message :Cloning into 'citoid'...

fatal: Remote branch REL1_31 not found in upstream origin)

#Citoid page

Like VisualEditor, the master branch requires alpha builds of MediaWiki. If you're installing on an other mediawiki version, use the right branch like git clone -b REL1_29

I cannot find REL1_31 services/citoid

Mvolz (talkcontribs)
Timmy87 (talkcontribs)


Thanks for answer. I agree that. I solved problem with configuration of AWS security options. Acutally It's absolutely my mistake. I didnt open port1970 in AWS setting.

I solved it and the url/citoid is well operating. But. Other function(DOI, ISBN, PMID...) have a problem  :-(

Mvolz (talkcontribs)

Unfortunately ISBN won't work anymore without a developer key from worldcat, since they shuttered their xisbn service a year or so ago :/. DOI and PMID should be working.

Timmy87 (talkcontribs)

ah! I understood.

Whenever I try to enter the pmid, I can see following message.

Lua error in Module:Citation/CS1/Identifiers at line 47: attempt to index field 'wikibase' (a nil value).

I can't find someone who struggled with same problem.

PMID auto-search citation is what I really want to use and why I started mediawiki. :-(

Timmy87 (talkcontribs)
Mvolz (talkcontribs)
Mvolz (WMF) (talkcontribs)
Reply to "We couldn't make a citation for you."

Bloomberg sites say "are you a robot?"

Summary by Mvolz (WMF)

Tracked in task T210871

Roy17 (talkcontribs)

I just tried adding an article from by citoid, but apparently bloomberg thinks citoid is a bot and asks for a captcha test. Is there anything Citoid developers can do? Or just up to bloomberg IT staff?

PerfektesChaos (talkcontribs)

AFAIK the citoid query at external website declares itself to be Citoid, not a user agent (browser).

One should not try to cheat by pretending to be a browser, since that will be discovered easily on the following dialogue.

Yes, you are right, Bloomberg staff should make a silent exception, but that could be exploited by every other grabber then.

Reply to "Bloomberg sites say "are you a robot?""
Dvorapa (talkcontribs)

Hello, in Citoid/itemTypes the "case" type is completely missing there. Also there are multiple examples missing and some links even doesn't work. Could someone update the page a little?

Dvorapa (talkcontribs)
Mvolz (WMF) (talkcontribs)

I fixed a few things but we don't really maintain the page, another volunteer made it. Which links didn't work? Feel free to remove ones that don't.

Dvorapa (talkcontribs)

I will not remove any as it is better to find archive link than have no clue, what for example "hearing" stays for.

The link for thesis doesn't work, but I think I understand this one's meaning.

Dvorapa (talkcontribs)
Reply to "Missing case in itemTypes"