Talk:Citoid

Jump to: navigation, search

About this board

Previous archives are at /Archive 1

Boghog (talkcontribs)

Hi. I posted this 9 months ago under Bugs: pmc, redundant urls, and ref tags. The only topic that was discussed when I posted this previously was the pmc accession number prefix (issue #1) and Redundant URLs (isssue #2) got lost in the discussion. Therefore I would like to raise the second undiscussed issue again, this time with more background.

Redundant url parameters are often added when there is another more appropriate and compact parameter that produces exactly the same link. Examples include:

IMHO, Citoid should check the url, and if it matches any of the above patterns, then the more specialized parameter should be populated and the url parameter should be left blank. As it stands now, Citoid is adding both the specialized parameter and the redundant url parameter.

PerfektesChaos (talkcontribs)

Basically you are right.

The German template for printed material w:de:Template:Literatur does support DOI= as well as PMID= and PMC=.

Furthermore, German templates detect URL as above and will issue error messages and urge to use plain ID rather than URL.

Note that dx.doi.org recently changed to doi.org in new usages.

Whatamidoing (WMF) (talkcontribs)

I understand that most readers don't know that the clicking the PMID/doi/etc. numbers will take them to a useful page, but they do understand that clicking an article title that looks like an external link is going to take them to the article. From the POV of a non-technical reader, having the "redundant" link is very helpful.

Boghog (talkcontribs)

I think you are underestimating the intelligence and curiosity of the average reader. Readers understand that blue text contains a hyperlink and links generally lead to useful information. At the very least, if a reader is interested in a source and sees a link in the source, they are liable to clinic on it to see where it leads.

Furthermore the pmc parameter will also link the article title. Therefore |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5615317/ is completely redundant with |pmc=5615317.

Reply to "Redundant URLs"

JSTOR is sometimes a journal and sometimes a website

2
AManWithNoPlan (talkcontribs)

https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/https%3A%2F%2Fwww.jstor.org%2Fstable%2F3073766

Sometimes the json is correct (a journal with doi, etc), and other times (often), Citoid thinks this is a website. I find this confusing, since it is not consistent.

AManWithNoPlan (talkcontribs)

More information. Once citoid goes bad, it stays bad for a while, and then suddenly works again. I have tried coming in from other IP address ranges and with other jstor IDs just to make sure that I was not personally being blocked. Thoughts: jstor some how blocks you, but you still get title with “ - on JSTOR” appended. There are multiple and different citoid severs running, but then I would think it would be more random and not last for a time period.

Reply to "JSTOR is sometimes a journal and sometimes a website"

404 URL returns data anyway that is wierd

4
AManWithNoPlan (talkcontribs)

When I submit this URL,

https://www.thecaterer.com/business/companies/33812/barracuda-group-limited

Citoid acts like it is real, even though it know that is is 404, and sends this back

[

 {
   "url": "https://www.thecaterer.com/business/companies/33812/barracuda-group-limited",
   "itemType": "webpage",
   "websiteTitle": "{{metaTags.other['og:site_name']}}",
   "title": "{{metaTags.other['og:title']}}",
   "abstractNote": "{{metaTags.other['og:description']}}",
   "accessDate": "2017-11-20",
   "author": [
     [
       "",
       "Template:MetaTags.other.author"
     ]
   ],
   "source": [
     "citoid"
   ]
 }

]

Mvolz (WMF) (talkcontribs)

Thanks for reporting!

Unfortunately this seems to be a problem with the website. Whilst the website *says* it's a 404, it's not actually a real 404 page. The http response header it claims it's a 200 ok response :D. The weird metadata is also because they're using a templating system that returns the raw template tags in the metadata of the page.

Unfortunately I'm not sure how actionable this is on our end- perhaps the best thing to do is to report the issue to the webmaster of the website.

Jonesey95 (talkcontribs)

Citoid should reject any proposed citation data with double curly braces in it. It causes error messages.

Mvolz (WMF) (talkcontribs)

Yeah, VE should be escaping this: probably https://phabricator.wikimedia.org/T143453?

Reply to "404 URL returns data anyway that is wierd"

Bad JavaScript Error "OO.ui.TabPanelLayout is not a constructor"

3
Janiko (talkcontribs)

Hi all, hope this is the good place to get some help. I'm trying to install Citoid extension on a personal wiki, with VisualEditor, and I get a bad JS error.

I have first installed mediawiki (1.29), enabled only a couple of extensions (Cite, Scribunto, TemplateData...), and Visual Editor. All works fine as expected. Besides, I have a parsoid+citoid (+ zotero/translation-server) server, working well too. I followed the setup instructions (here, here, etc.).

When I enable Citoid extension, I get no mediawiki error (in debug mode). But when I try to use the “Source” function, I get 2 JS errors I don’t get without Citoid (the “Source” function in VE works well with Citoid not enabled).

I tried on different browsers, with different skins, the result remains the same.

What I can see in the browser’s console is below. The 1st error make me think the oojs-ui is not well loaded though it’s installed (with composer).

Where to look to search the cause?

Thanks !

Janiko

(1) Uncaught TypeError: OO.ui.TabPanelLayout is not a constructor

   at VeUiCiteFromIdInspector.ve.ui.CiteFromIdInspector.initialize (load.php?...)

   at VeUiCiteFromIdInspector.OO.ui.Window.setManager

Code:        this.modePanels = {

           auto: new OO.ui.TabPanelLayout('auto',{

               label: ve.msg('citoid-citefromiddialog-mode-auto')...

(2) Uncaught TypeError: Cannot read property 'setDisabled' of undefined

   at VeUiCiteFromIdInspector.<anonymous>

Code:    ve.ui.CiteFromIdInspector.prototype.getSetupProcess = function(data) {

       return ve.ui.CiteFromIdInspector.super.prototype.getSetupProcess.call(this, data).next(function() {

           var fragment;

           this.lookupPromise = null;

           this.staging = !1;

           this.results = [];

           this.lookupButton.setDisabled(true);

           this.inDialog = data.inDialog || '';

           this.replaceNode = data.replace &&...

Mvolz (WMF) (talkcontribs)

To me this sounds most likely to be a version mismatch. Branches get cut for mediawiki weekly and you'll need to have the same branch for VisualEditor an Citoid, and OOUI/OOJS as they won't be compatible with other branches (although composer should be giving you the correct version of ooui :/). I seem to remember tab layout getting changed little ways back.

Where are you getting your versions? From git, or packaged somewhere?

The best place to ask for help would probably be somewhere there are VisualEditor devs, #mediawiki-visualeditor on IRC maybe? https://www.mediawiki.org/wiki/VisualEditor/Setup Or you might actually have more luck with people experienced in dealing with versioning issues i.e. #wikimedia-tech

If all else fails you can report a bug, https://phabricator.wikimedia.org/ and tag with VisualEditor/citoid.

Janiko (talkcontribs)

Thank you for your answer. Unfortunately I've reinstalled VE and Citoid, checked the branch label (both VE and Citoid with -b REL1_29 option while git cloning), relaunched git submodule update --init in VE directory and still got the same behaviour: VE works well until I try to load Citoid. I will try on phabricator.

Reply to "Bad JavaScript Error "OO.ui.TabPanelLayout is not a constructor""

Publication date field for some scientific journals are not provided anymore

5
Summary by Whatamidoing (WMF)
Albert Ke (talkcontribs)

Hi,

It looks like for certain scientific journals the 'publication date' field isn't provided anymore. For example for the "Earth Surface Dynamics" journal you will see the publication date (See e.g.: https://en.wikipedia.org/api/rest_v1/data/citation/zotero/10.5194%2Fesurf-5-21-2017 which has the publication date provided as: "date":"2017-01-16").

However, the "Geology" journal, published by "geoscienceworld", has no publication date through zotero (See e.g.: https://en.wikipedia.org/api/rest_v1/data/citation/zotero/10.1130%2Fg38665.1 (only access date is shown)).

Would there be a quick fix such that publication date can be pulled through zotero as well?

Thanks, Albert.

Whatamidoing (WMF) (talkcontribs)

I don't know how to check for this myself, but usually, when something like this happens, it means that the 'broken' sources have rearranged their websites, and that the Zotero translator needs to be updated as a result.

Albert Ke (talkcontribs)

Thank you @Whatamidoing (WMF). I'm a bit new to these Zotero formats. Is this where the translators are: https://github.com/zotero/translators?

And if so, would the 'crossref' translator be the one to edit given that citoid links to crossref when searching on doi by: https://en.wikipedia.org/api/rest_v1/data/citation/zotero/10.1130%2Fg38665.1

Thanks, Albert.

Whatamidoing (WMF) (talkcontribs)

I don't know how to edit the Zotero translators. However, @Czar and @Mvolz do.

Czar (talkcontribs)

Looks like Citoid does its own DOI import (not through Zotero): https://github.com/zotero/translators/issues/1476

Mvolz will be best equipped to handle this. We can continue at phab:T180408

Reply to "Publication date field for some scientific journals are not provided anymore"
ערן (talkcontribs)

Is scribunto really needed for working with citoid? If not, consider to remove it from the documentation here

Reply to "scribunto"

Confusing publisher and via parameters?

12
SMcCandlish (talkcontribs)

Someone mentioned to me that this tool is incorrectly outputting values like |publisher=Google Books, at least for en.wikipedia citation templates. If it's still doing that, it needs to be fixed ASAP to correctly use the |via= parameter for such intermediary distributors as Google Books, YouTube, Project Gutenberg, PubMed, JSTOR, etc. (if it hasn't been fixed in this regard already). I don't use VE, so I'm not even sure how to test this.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  04:55, 24 July 2016 (UTC)

Jc3s5h (talkcontribs)

Last I checked, Project Gutenberg retypes the text, they don't just scan it. So their books are new editions and they are the publisher. Citation templates don't provide any mechanism to show that an earlier edition was published by a different publisher.

SMcCandlish (talkcontribs)

Wikipedia would still not treat them as a publisher, and having tools automatically do so is misleading and wrong for our implementation of source citations. Project G. is a republisher, and that is what |via= is for, even if they did some hand cleanup of their OCR (and, yes, they do use OCR). It's no different from converting a book to PDF and then eBook format. That doesn't make you magically a new publisher, it just means you've done the work (including any after-automation cleanup) to format-shift something.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  18:50, 27 July 2016 (UTC)

Jc3s5h (talkcontribs)

I think the via parameter would imply that the republication is page-for-page and line-for-line identical to the original publication. Frequently in the past republications would be repaginated, so a passage that appeared on page 100 in the original might be on page 95 in the republication. I believe this was the case in the early days of Project G., although maybe not the more recent publications. Certainly an edition with different page numbering than the original should be treated as new editions. If in doubt, the presumption should be it is a new edition, to avoid making a false claim about what page a passage occurs in the original (which the citing editor has never seen).

SMcCandlish (talkcontribs)

Well, it doesn't imply that. Electronic versions of documents are very often not "page-for-page and line-for-line identical" to the paper version, unless painstakingly made that way, usually in PDF form. If I write a book and release in PDF form through O'Reilly by special arrangement with them, and (within our licensing parameters) you use some tool to convert it to Kindle format, and this changes the layout in some ways, you don't get to claim to be my book's publisher. Doing so would actually reduce the apparent reliability of the source, since you're just some random person, not a well-known publisher. Per en:w:WP:SAYWHEREYOUGOTIT we do want a |via= parameter identifying that this is a copy from some intermediary source and not straight from the actual publisher.

Citing specific page numbers in e-documents is generally pointless unless they are in fact exact PDF scans; we have the |at= parameter to identify where in an electronic document the material can be found. E.g., I would use this to cite the online edition of the Chicago Manual of Style by section number, since it doesn't even have page numbers. Intelligent use of |at= allows people to find the same part in a paper edition, too.  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  22:31, 28 July 2016 (UTC)

Jc3s5h (talkcontribs)

SMcCandlish wrote " If I write a book and release in PDF form through O'Reilly by special arrangement with them, and (within our licensing parameters) you use some tool to convert it to Kindle format, and this changes the layout in some ways, you don't get to claim to be my book's publisher."

Yes, I do. Whatever contractual agreements got put in place among you, me, and O'Reilly allows me to. Of course, if I'm violating copyright, my version shouldn't be cited at all. Or if it happens in the year 2200 and your copyright has expired, then I don't need anyone's permission to create a new edition.

Really no different than Bloomsbury being the original Harry Potter books but Scholastic being the publisher for the English North America editions.

SMcCandlish (talkcontribs)

Taking a file and running a conversion program on it is nothing at all like Scholastic typesetting, design a new cover for, creating new frontmatter for, printing, and distributing a NAm edition of book originally by Bloomsbury. I repeat: What you are talking about is nothing but format-shifting. It is no different from you posting a piece of digital art at DeviantArt, and me (pursuant to permissive licensing terms) putting a copy of it on my Facebook feed; which entails a new copy there, and a re-encoding, i.e. a format shift, and me and Facebook distributing the work to new people. Neither I nor Facebook become the publisher; DeviantArt remains the publisher, Facebook is the |via=. I suppose a philosophical argument can be made that they are two different kinds of publishing really, but who cares? The format-shifting and additional distribution isn't "publishing" for WP citation purposes.

This distinction is the very reason that the |via= parameter was created, to stop mis-attributing format-shifted and other repostings by random pseudo-publishers and content aggregators as the |publisher=, but retain the name of the actual publisher as such, and the name of the online distributor, so that people can find the work in the original form, not just on some possibly short-lived website, but can also use that website for convenience, and not be confused about the difference. For all we know, Google Books or Project Gutenberg could disappear tomorrow forever. The distinction is especially important for any entity that both reformats and distributes (|via=) material on behalf of external, traditional publishers, and also act as the publisher itself, for new (generally amateur) content. Amazon is already doing this, and this kind of business model shift can happen at any time (e.g. HBO, Netflix, and Amazon are all publishers of original television and e-TV series, when formerly they were, respectively, a cable redistributor, a by-mail and later online stream redistributor, and an e-tailer, of previously published content. So, already, any such entity could appear as a |publisher= or a |via=, for different sources in the same article, and the distinction in each case would matter.

When it comes to historical sources, the original publisher information is also often of pertinent, even of crucial value, since significant difference can exist between the 1645 version of something from a London publisher, and a 1672 edition produced in Dublin, without any intermediary e-distributor like Project Gutenberg even being aware of it. Or – and this is telling – they often are aware of it, and so is Google Books, and take pains to note the actual publisher. Neither service claims to be the publisher of such works, and it is a weird form of original research for WP to insist that they are.

With that, I'm kind of tired of arguing round in circles on this stuff, and don't need to keep at it. We have separate parameters for these things for both a citation accuracy and utility reason (helping readers find and use sources) and a policy reason, en:w:WP:SAYWHEREYOUGOTIT, and neither the separation of these parameters nor the rationales for the separation are going to go away just because you don't see it the same way. I could even be totally wrong about every single ting I've said other than the last sentence and it wouldn't make any difference, since there's already a consensus to keep them separate, and it is not necessary for my analysis of why to be correct (though it is).  — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  10:40, 1 August 2016 (UTC)

SMcCandlish (talkcontribs)

PS: I posted a cross-reference to this discussion, at en:w:Help talk:Citation Style 1, and it also turns out there is already an active thread about this over there:
  https://en.wikipedia.org/wiki/Help_talk:Citation_Style_1#A_Meta_discussion_on_the_difference_between_via_and_publisher
 — SMcCandlish ¢ ≽ʌⱷ҅ʌ≼  14:23, 1 August 2016 (UTC)

David Eppstein (talkcontribs)

Perhaps the point has been missed here in the back-and-forth. Wikipedia citations use "publisher" to mean the original publisher of an edition of a work. Some of our information providers use the same keyword for a different meaning, the most recent content provider. We should not mix up these two meanings merely because they use the same keyword. If information providers are using "publisher" to mean something different than what we want it to mean, Citoid should not be blindly copying them.

SMcCandlish (talkcontribs)

Agreed, entirely. If the most recent content provider isn't the real publisher of the content, the former should be in the |via parameter. I don't know if there's a practical way to make Citoid aware of a big list of journal aggregators, news aggregators, book scanning sites, etc., to code them as |via instead of |publisher, but I hope so. If WP can maintain a URL blacklist that includes virtually all known URL redirectors (tinyurl.com, etc.), I would think that it could maintain a list of content aggregators (pseudo-republishers).

Jc3s5h (talkcontribs)

It isn't quite as simple as original publisher vs. republisher. A republisher that simply copys images of the original publication and makes them available online should probably be named with the via parameter, or similar. But a publisher who re-typesets, and perhaps repaginates an older work should be regarded as a full-fledged publisher. Some citation styles call for naming the original publisher in this situation, but the Wikipedia citation templates do not have a parameter for this purpose.

SMcCandlish (talkcontribs)

We already covered this above; one of the hazards of "necroposting" on a year-old thread. WP cites sources to help readers identify and find them and to help editors verify our content. We do not do so as a bibliographic database service; the purpose is not to track the history of a work. So, WP has no need of being able to identify a previous publisher's details. If you have a genuinely republished version with new typesetting and pagination, or even just a new foreword/introduction, this is the work you are citing, by that particular publisher. We don't care who published the first edition that had different font, page numbers, or lack of a "50th anniversary" foreword or whatever. It's just not relevant.

[Conceptual aside: It's really no different from a quote being in a New York Times article, perhaps with an "[editorial tweak]" in it, and a reporter's introduction ("According to X. Y. Zounds in The Zounds Method,"). We cite the newspaper article we found the quote in, not the original primary source of the statement (unless we also have that, and have checked it, and it's appropriate to "double-up" the citation for some reason, e.g. because another source misquoted it and caused a controversy). A new edition, a real republication, of a work is a similar matter; the original material being included is essentially a giant quotation, may have been editorially altered in the course of republication, and may have new lead-in material, a big "Foreword" or "Introduction to the Nth Edition" version of a journalist prefacing a quoted statement from a speech or document.]

By contrast, |via is important, for actual WP purposes and in addition to |publisher, to use for cases of pseudo-republishing, i.e. redistribution or format-shifting, such as if you got something via a scanning site or a content aggregator:

  1. That intermediary is incidental and has no effect we care about on the content itself (e.g., we DGaF if it has an aggregator's watermark on it; that isn't substantive and does not constitute an "edition" or a new "publishing" for WP purposes).
  2. The URL or the entire aggregator itself might not be there tomorrow. I have no insider info on the budgets of Project Gutenberg, Internet Archive, Google Books, or the journal aggregators, but these things cost money to operate. We do know that at least the first two of these have had funding struggles in the past, and still publicly seek donations to keep them going. The latter two are things a profit-minded business entity could axe at any moment, or start paywalling, as a simple business decision. The only consequence of such a failure is a dead URL. The actual citation is to the original work and remains valid; the work still exists and can be found. The dead link info is removed from the citation; we do not remove from citations the names of actual publishers who have ceased operation.
  3. It may not be the most convenient or effective way for a particular reader to get the work.Examples: if someone has taken a print-out of the WP article to a public library and all its Internet access kiosks, if there are any, are in use, but the library may have the original work on its shelves; or in a place where Internet access is costly and schlepping down a huge PDF is not practical, but looking at a paper copy you got via inter-library loan is free; or when a journal aggregator is not free for full text, with that only accessible for pay or at institutions with a subscription; and ... insert numerous other scenarios.

[Second conceptual aside: If I have a blog that I publish, and someone cites it, and the site goes down permanently, and it wasn't archived by Wayback machine or something equivalent, then that site is gone; i.e., it cannot be used by readers/editor for verification, ergo it is no longer a valid source citation. A conduit for a copy of a publication (e.g. Wayback.Archive.org), and the publication itself (McCandlishWorldNews.com or whatever): a big and clear difference. People seem to have unreasonable difficulty with the distinction, just "because Internet", i.e. because "a website is a website" in many minds; they're confusing the medium for the message, the delivery format for the content.]

Reply to "Confusing publisher and via parameters?"

List of translators available in Citoid

8
Lsanabria (talkcontribs)

The only URL with a list of translators that I could find was updated almost a year ago. Is there any way to find out what translators are avilable in Wikimedia?

Danmichaelo (talkcontribs)

Ping Mvolz. I'm also curious about how often the translators are updated from https://github.com/zotero/translators/ since I just submitted my first translator there :)

Mvolz (WMF) (talkcontribs)

Very intermittently. Once your translator gets merged feel free to ping me somewhere and I'll get it merged upstream :).

Danmichaelo (talkcontribs)

Thanks, Mvolz, it was merged now :)

Mvolz (WMF) (talkcontribs)

https://github.com/wikimedia/mediawiki-services-zotero-translators is the list of our translators, unfortunately these also include translators which don't work with citoid (ones which don't have the 'v' under browser support flag i.e. this one works and this one doesn't.) There used to be tests run daily and the results online, which better showed you which ones work and which ones don't, but those have been broken for awhile now.

Czar (talkcontribs)

It would be great to have that translator results tool back up now that Refill is using Citoid. Even a quick and dirty indicator would be good. Otherwise it's harder to diagnose why links aren't expanding properly.

Mvolz (WMF) (talkcontribs)

Tracking in phabricator here: https://phabricator.wikimedia.org/T137440

Czar (talkcontribs)

For posterity, here's the link that tests the Citoid server for site/translator compatibility:

https://zotero-translator-tests.s3.amazonaws.com/index.html

Reply to "List of translators available in Citoid"

API shows the word "pages" only and I can't see any template fields

4
Atef81 (talkcontribs)

I have followed all the steps in the documentation over a week ago, and until now the template items don't show in the visual editor citation module.

I followed the steps in this section here: Citoid#Empty_references_appear, but all what appears in the api is this:

{
    "pages": {}
}

the template definitely has data in the :

"maps": {

    "citoid": { 

and also has the correct templatedata codes.

I executed the jobqueue several times to make sure there are no pending jobs, and null-edited the templates several times.

Any suggestions are welcome, please. Thanks in advance.

Whatamidoing (WMF) (talkcontribs)

Which wiki are you trying to fix?

Atef81 (talkcontribs)

Sorry I missed this message. I am trying to fix a Mediawiki installation I have. it is not part of the Wikimedia project. is this what you are asking or I missed the question? thanks.

Whatamidoing (WMF) (talkcontribs)

That was my question. I don't know how to fix your problem, but if it was a WMF wiki, then I could try to find someone who might be able to fix it for you.

You've already tried all the things I would recommend to you. @Mobrovac-WMF, if you see this ping, do you have any advice?

Reply to "API shows the word "pages" only and I can't see any template fields"

Citoid: several publishers recently missing?

4
Albert Ke (talkcontribs)

Hi,

I think Citoid is the most fantastic service for mediawiki and I started using it about a year ago to get automatically information on many publications, based on the DOI. Really great work!

Not too long ago, Citoid switched to a new API endpoint and I was able to connect to that as well but I noticed that e.g. the Wiley publisher is suddenly not supported anymore (but used to be supported in the past). Here are some examples of articles of which data can no longer be retrieved anymore:

Wiley publishers: https://en.wikipedia.org/api/rest_v1/data/citation/zotero/10.1046%2Fj.1365-2117.2002.00186.x

AGU publications (part of Wiley): https://en.wikipedia.org/api/rest_v1/data/citation/zotero/10.1029%2F94WR00436

However, the service works great for e.g.:

https://en.wikipedia.org/api/rest_v1/data/citation/zotero/10.1111%2Fj.1751-8369.2002.tb00087.x

Would the Wiley publisher become available again through the API in the foreseeable future?

Thank you!,

Albert.

I have posted this on phabricator as well ( https://phabricator.wikimedia.org/T165105)

Mvolz (WMF) (talkcontribs)

Hi Albert,

We have historically had trouble with scraping Wiley - something to do with their redirect system puts us into a redirect loop and can cause timeouts. But the DOI SHOULD work (since then we have the DOI to work with to query crossref even if we can't access the website) so that sounds like a bug. I will look into it further.

Thanks for reporting this!

Albert Ke (talkcontribs)

Appreciated Mvolz,

Hope it is something easy to solve!

Albert.

Whatamidoing (WMF) (talkcontribs)

Update: It looks like this is (mostly?) a problem involving RESTBase. I don't know when it will get fixed, but they're working on it.

Reply to "Citoid: several publishers recently missing?"