Topic on Talk:Citoid

Bugs: pmc, redundant urls, and ref tags

16
Boghog (talkcontribs)

Not sure where the best place to post this. But here goes:

I never use this tool myself, but I frequently am cleaning up after others that do. Three problems that I have noticed:

(1) in the PMC parameter, the value should be a integer and not prefixed with "PMC":

incorrect:|pmc=PMCxxxxxx (where xxxxx = integer)

correct: |pmc=xxxxx

The incorrect form throws an error that must be manually fixed.

(2) urls are sometimes added that are indentical to that already produced by |doi=, |pmid=, or |pmc=. In this case, the url should be suppressed since it is redundant.

(3) Ref tags that take the form of ":0", ":1", ":2". While they are unique, they are not very informative. Better to create a Harvard style ref tag, the form of first authors last name + year of publication (i.e., "Smith_2017" is much more readable than ":0").

197.218.88.122 (talkcontribs)
Boghog (talkcontribs)

Thanks for the links, however is important to distinguish between a template imbedded in wiki markup and how that template is rendered. The NIH recommendations only concern how the citations are rendered, not how they are entered in a {{cite journal}} template. The NIH has no recommendation on the syntax of templates.

The {{cite journal}} template renders the citation very close to the NIH recommendations (PMCxxxxx). The only difference is that {{cite journal}} adds a space between PMC and xxxxx. PMC in turn contains a wikilink to a Wikipedia article that explains what PMC stands for.

It is completely unnecessary prepend the parameter value with "PMC". This is redundant. The template itself produces the correct rendering. Hence the problem is with Citoid, not {{cite journal}}.

197.218.88.122 (talkcontribs)

Well, that's a double edged sword. It is true that template markup differs from rendering, that means that it doesn't make any difference to a template if it receives a pmc = pmc XXXX or pmc = XXXX since it can be removed by parser functions or lua. So it comes down to aesthetics or personal preferences of some users because the template can be changed to accept both if there's consensus to do so.

The WMF developer's point in the phabricator task seems to be the same as the one in the publisher's site. The id is "PMC XXXXX", and the site also recommends "PMCID : PMC XXXX". Pages like Cancer AND Cholera don't seem to follow that recommendation anyway. While wikimedia users may prefer to use only integers to identify it, citoid and many wikimedia tools are also used by third parties, and such exemptions may not be wanted by them. There's also no guarantee that all wikipedias use the same format as english wikipedia, so it is possible that other wikis were doing the exact opposite, e.g. adding PMC where it only had an integer. The only alternatives are to change the code only for the wikis that prefer it that way, or leave it as is.

They stripped it previously (https://phabricator.wikimedia.org/T78144) without doing the research to understand the "official" preferred value, this time they decided against it after doing the research.

Anyway, I'm not WMF developer, nor associated with wikimedia, you'll have to convince them :).

Boghog (talkcontribs)

Can you cite a single example of another Wikipedia including "PMC" in the parameter value? Most other Wikipedias don't support pmc to begin with, and the ones that do have generally have followed the English Wikipedia's lead.

Citation Style 1 templates were created before Citoid. Hence it is reasonable that Citoid be compatible with Citation Style 1 templates, not the other way around. Furthermore, none of the other citation generation tools include "PMC" in the parameter value. I suppose that Citation Style 1 templates could be modified to optionally accept PMC in the parameter value, but this unnecessarily clutters citation templates with redundant characters. The parameter name is pmc, why does the parameter value also need to include pmc?

I have reopened the case on wikimedia.

197.218.88.122 (talkcontribs)
Can you cite a single example of another Wikipedia including "PMC" in the parameter value?

You sparked my curiosity, and here's one example:

https://el.wikipedia.org/wiki/%CE%91%CE%BC%CE%B9%CE%BD%CE%BF%CF%84%CE%B5%CE%BB%CE%B9%CE%BA%CF%8C_%CE%AC%CE%BA%CF%81%CE%BF

Citation Style 1 templates were created before Citoid

True, but that would be over fitting. There are more than a hundred encyclopedias, and the tool should be neutral and not cater to a single one. That's what it does, it returns the id as requested. Stripping it means that it just returns a number.

The parameter name is pmc, why does the parameter value also need to include pmc?

Knowing the full PMC means that users can quickly verify if citoid actually gave them the right data, instead of guessing. This is particularly important because a number can mean just about any random thing (e.g. a PMID number instead of a PMC). Just because it adds it as a parameter doesn't mean it is necessarily correct.

Boghog (talkcontribs)

In the Greek Wikipedia, using a pmc parameter value where PMC is prepended is optional. However the rendering with when PMC is included in the parameter value looks strange (PMC is displayed twice and PMCID is not displayed). This should be fixed.

The whole purpose of Citoid is to automate citation generation process so the editor doesn't have to worry about the accuracy of the parameter values. Since the data is downloaded from PubMed, the parameter values with a high degree of confidence are correct.

Numbers are not any less random if PMC is prepended to the parameter value. The best test to verify that the the parameter value is correct is to follow the rendered PMC link.

Whatamidoing (WMF) (talkcontribs)

Boghog, I tend to think that the IP is correct: it would be good for {{cite journal}} to accept the "formally correct" id number. This has been discussed for a couple of months now, and I've not yet heard any technical reason for the template to choke when it's given the "official" id number. I don't see a need to require the official id number, but it should be able to cope with it.

Whatamidoing (WMF) (talkcontribs)
Guarapiranga (talkcontribs)

Two years on, and I'm still getting this error… The automated citation bot returns PMC=PMCPMCnnnnn (where n is a digit), and Wikipedia—correctly—identifies it as an error.

It's also generating dates that are wrong, according to Wikipedia own's style manual.

Boghog (talkcontribs)

The pmc=PMCPMCxxxxx is a new bug. I guess if two PMCs is good, three must be better ;-) I still do not understand the rationale for prefixing the numeric PMC value in citation templates. The pmc parameter name makes it clear what it is. In any case the redundant PMCs are being systematically removed from the English Wikipedia by gnomes and bots.

Whatamidoing (WMF) (talkcontribs)

The (obviously wrong) double PMC problem has been fixed.


In case this comes up again, the point behind "prefixing the numeric PMC value" was that PubMed said that the complete identifier includes the "PMC" code. The English Wikipedia has historically used a truncated representation of the full id code. This is the equivalent of someone saying that their "correct" telephone number includes the local number but excludes the country and city/area codes. It's functional in some contexts, but it's not complete or unambiguous.

Boghog (talkcontribs)

Why is the context outside of Wikipedia even relevant? Will someone ever harvest the raw cite journal template to use outside of Wikipedia? I doubt it. Even if they did, it would be a trivial matter to do a search and replace of "pmc=xxxxx" with "PMCxxxxx".

Within Wikipeida, the template, the pmc parameter name, and the rendered citation all make crystal clear the context. Why do we need to specify pmc=PMCxxxxx? The second PMC is completely redundant and unnecessary. pmc=xxxxx is clear enough. The redundant PMC prefix is being treated as a maintenance issue (see https://en.wikipedia.org/wiki/Category:CS1_errors:_PMC) and is systematically being removed by bots and gnomes.

In addition, PubMed itself is inconsistent. The PMC ID is prefixed by PMC, but the PMID ID is not prefixed with PMID? Why is that?

Finally It is true that NIH grant applications must specify PMCID: PMCxxxxxx (see https://publicaccess.nih.gov/include-pmcid-citations.htm). Based on that logic, then we would need to replace pmc=PMCxxxxx with pmc=PMCID: PMCxxxxx. Right? But how often does an NIH grant applicant resort to copying a Wikipedia citation template for one of their own publications? I bet that it has never ever happened.


Whatamidoing (WMF) (talkcontribs)

I think that if you asked people familiar with the English Wikipedia's core community, then you'd likely find that they're (we're) generally perceived to be fanatical about getting the facts as absolutely correct as humanly possible. Consequently, it feels strange to me that these same people seem to shrug their shoulders and say "Yup, there exists a canonical form for that identifier, but I'd rather do it my way than get it right".

The fact that a bot is "un-correcting" the canonical form (when the template could be changed to accept both forms) is particularly strange. Can you think of any other instance in which an identifier or similar objective fact is deliberately rendered in the "wrong" form? CheBI, ChEMBL, DTXIDs all get their prefixes.  Why not this one?

Boghog (talkcontribs)

One needs to distinguish between data stored in a template and how it is rendered. This discussion is about the former.

In both {{infobox chemical}} and {{infobox drug}}, the CheBI parameter only accepts an integer value without a prefix. If one prepends the integer accession number with "CheBI", it generates a link that triplicates the CheBI prefix (e.g., CheBI: CheBiCheBixxxx) and the rendered external link is non functional. Ditto for ChEMBL. I am not sure what DTXID is.

Therefore, the cite journal pmc parameter is similar to the infobox chemical/drug parameters CheBI and ChEMBL except the pmc parameter will accept the pmc prefix, strip the (redundant) prefix from the rendered citation, and generate a functional link, but in addition, will flag it as a maintenance category.

The pmc value is stored in a parameter that is called pmc. So why do we need to repeat pmc in the parameter value?

Boghog (talkcontribs)

One additional item. The NCBI eutils search engine (NCBI administers PubMed Central) allows searches without the PMC prefix:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=212403

and also returns PMC accession numbers without the prefix. The output of the above URL reads in part:

<article-id pub-id-type="pmc">212403</article-id>

This suggests that the NCBI stores PMC accession numbers in their internal databases without the prefix and adds the prefix when rendered in PubMed pages. {{cite journal}} templates are analogous to a database that does not store the prefix in the pmc parameter value, but does display the prefix when the template is rendered in a Wikipedia article.

The NIH regulations do not specify how pmc accession numbers should be stored in databases. They only specify how they should be displayed in a written grant application.

Reply to "Bugs: pmc, redundant urls, and ref tags"