Jump to content

Talk:Citoid/2021

Add topic
From mediawiki.org
Latest comment: 3 years ago by Diegodlh in topic website specifications for citations

Previous archives are at /Archive 1

Nodejs version for citoid

[edit]

Helloǃ While following the instructions from the Citoid page, I discovered that the npm install step for citoid does not work with my current node installation. I'm using the 12.20 version from nodesource PPA on Ubuntu 20.04 and I get errors like

gyp ERR! command "/usr/bin/node" "/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /home/giuseppe/citoid/citoid/node_modules/heapdump
gyp ERR! node -v v12.20.1
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok 

While instead if I perform this step on a ubuntu docker instance with nodejs from standard repo (v10.19), it works fine. Do you have any suggestion on how to overcome this issue without changing the installed version of node? Thanksǃ (PSː I posted this question in @Mvolz (WMF)'s talk page fiew days ago). Baruneju (talk) 21:58, 23 January 2021 (UTC)Reply

Hello, unfortunately we only support 10 as that's what we run in production - I don't really know much about your production environment but you can use nvm to run multiple versions of node on the same machine- not sure this helps! Mvolz (WMF) (talk) 15:30, 28 January 2021 (UTC)Reply
(I've updated the docs now though, thanks for pointing this out!) Mvolz (WMF) (talk) 15:31, 28 January 2021 (UTC)Reply
Thank you, I'm not quite familiar with the node ecosystem, but I'll git it a tryǃ My production environment is luckily my production environment is still clear so I'm going to have this issue only on the development machine. Keep up the good workǃ Baruneju (talk) 00:09, 1 February 2021 (UTC)Reply

A visual Zotero/Citoid translator editor?

[edit]
Hi all! I'm thinking of applying for a software grant [1] to develop a visual Zotero/Citoid translator editor. Before writing the proposal, I would really appreciate your feedback about the idea, as summarized here [2]. Thanks! Diegodlh (talk) 00:35, 9 March 2021 (UTC)Reply
A proposal has been presented here. We would appreciate your thoughts, comments and questions in its discussion page, as well as your endorsements if you would like to support it. Thank you! Diegodlh (talk) 22:47, 16 March 2021 (UTC)Reply
In principle, it sounds like a great idea to me. @Czar could probably give you more useful advice. Whatamidoing (WMF) (talk) 22:13, 10 March 2021 (UTC)Reply
Scann and I are happy to announce that our grant proposal has been approved!! We are now putting together an Advisory Board to help us think and tackle critical aspects of the project, and build sustainability and community involvement. We will be thrilled to have anyone from the Citoid team (or interested in Citoid) onboard! Applications are open here. Thank you!! Diegodlh (talk) 00:01, 23 July 2021 (UTC)Reply

Running as a separate service

[edit]

Has anyone figured out if its possible to run this in a separate "location" to the main mediawiki installation like how it was possible with the older nodejs parsoid and have them link to each other? I run my wiki on a managed server in virtual docker containers. It's not possible to install nodejs on the same container as the main apache container and so I would have to run it in a separate container. I know many people had similar problems with VisualEditor and the older javascript parsoid. InnerCitadel (talk) InnerCitadel (talk) 10:58, 1 May 2021 (UTC)Reply

Wikilinking in citoid?

[edit]

How far are we from the possibility of Citoid being able to automatically include wikilinks to publishers that have a page? (I realize consensus on whether we ought to do so would need to be established, but I'm wondering here more about the technical aspect. Personally, I think we should.) Sdkbtalk 03:38, 7 July 2021 (UTC)Reply

There's a phabricator ticket for it here, with a brief discussion of the technical aspects: https://phabricator.wikimedia.org/T212112
I would say, personally, pretty far. From reading the page above, I see you've also encountered the problem of ambiguous publisher names! I think one way to disambiguate names might be to use wikidata (which have "synonyms" and you can use the instance of or subclass properties to make sure you've got the right sort of item as well) but actually doing this well enough for it to be reliable so it can be done automatically and not require using user input would involve quite a substantial change to how citoid works in the back end (which right now is a glorified webscraper).
Though there are front end approaches too, which might be better, at least for Citoid in Visual Editor. Mvolz (WMF) (talk) 07:31, 7 July 2021 (UTC)Reply

No spaces in citation output

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


It drives me absolutely bonkers that the output doesn't put a space before every pipe / after the field content in the citation it generates.

For example, generated as: {{Cite journal|last=Forsman|first=R W|date=1996-05-01|title=Why is the laboratory an afterthought for managed care organizations?|url=https://academic.oup.com/clinchem/article/42/5/813/5646564|journal=Clinical Chemistry|language=en|volume=42|issue=5|pages=813–816|doi=10.1093/clinchem/42.5.813|issn=0009-9147}}

I want it to output as: {{Cite journal |last=Forsman |first=R W |date=1996-05-01 |title=Why is the laboratory an afterthought for managed care organizations? |url=https://academic.oup.com/clinchem/article/42/5/813/5646564 |journal=Clinical Chemistry |language=en |volume=42 |issue=5 |pages=813–816 |doi=10.1093/clinchem/42.5.813 |issn=0009-9147}}


What file or files in the Citoid extension must I modify in order to add a space before every pipe / after field text?

EDIT: I'll add that I do 99 percent of the entry on our wiki, and I almost exclusively work in the source editor. The only reason I added Citoid is because of years and years of manually typing citations out. The bummer, of course, is that the Automatic feature of Citoid only works in VisualEditor, not in the source editing screen. But the time savings of automatically generating citations was tantalizing. Yet because most of my work is in source editing, I still want slightly better readability of the generated citations, and thus I want to add a space before every pipe. Hopefully this is a relatively easy thing for me to modify?


Thank you. Lostraven (talk) 17:51, 4 August 2021 (UTC)Reply

After a lot of digging, I found a Phabricator ticket from 2020. Does Citoid not manage the output to the text editor? Or is it VisualEditor, as this ticket suggests? Digging further. Lostraven (talk) 19:44, 4 August 2021 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Adding new fields after automatic generation

[edit]

Otfentimes after attempting to use the "automatic" feature to generate a citation, the output is missing several fields that can be filled (most commonly authors and dates). It would be helpful to be able to manually add these fields back in in an UI after the generation as well. In VE you can do it by inserting, and then clicking Edit on the generated template, but this is a bit clunky and when doing source editing in the 2017 wikitext editor it is impossible. Would it be possible to be able to have another step after the generation to "fill in the blanks" or at least edit them after-the-fact in 2017 wikitext? Thanks! Eviolite (talk) 04:05, 8 September 2021 (UTC)Reply

If memory serves, someone else suggested this several years ago. I'll see if I can find the phab: number. I don't expect the WMF's Editing team to work on this tool for a while (they're focused on Talk pages project this year), so it may be a long time before we get what we want. Whatamidoing (WMF) (talk) 19:33, 8 September 2021 (UTC)Reply
Izno in the Wikimedia community discord server found it, it's phab:T174585. Apologies for not replying with it earlier, but thanks for the update. Eviolite (talk) 19:39, 8 September 2021 (UTC)Reply

DOI, ISBN and ISSN sources

[edit]

What are the source databases for generation references from DOI, ISBN and ISSN? Juandev (talk) 10:25, 14 October 2021 (UTC)Reply

  • For DOI, it is the resolved URL itself.
  • For ISBN I guess it is WorldCat. You may find the evaluation here.
ISSN has no resolver since a series of journal issues over decades is not a single article as subject for citation.
The available codes are: DOI ISBN PMC PMID.

Translation

[edit]

Why is most of the text not marked for translation? DDPAT (talk) 20:32, 20 November 2021 (UTC)Reply

Citoid/Enabling_Citoid_on_your_wiki has a subset of the information and has better translation coverage. Mvolz (WMF) (talk) 09:35, 24 November 2021 (UTC)Reply

website specifications for citations

[edit]

Where are the specifications for what websites need to do to enable citoid to generate citations from URLs?

I presume these are wc3 specifications and have to do with html tags, but what documentation do we use to explain what website operators need to do to make themselves aligned with citoid? Is there anything in the Wikimedia platform?

I am imagining tags like author, title, date, etc. Where is the full list of what citoid takes and instructions. Blue Rasberry (talk) 22:39, 21 November 2021 (UTC)Reply

I suggest you read en:Zotero as introduction and continue by crawling zotero.org, somewhere there is the official documentation, since you ask for a full list.
There are known websites where information can be retrieved individually, and if unknown then general methods like en:Dublin Core will be tried. PerfektesChaos (talk) 16:49, 23 November 2021 (UTC)Reply
I might be lost. I was expecting Wikimedia documentation, and I think you are suggesting that compliance with Zotero is equivalent to compliance with Wikipedia. Is this the case, and will this be the case for the foreseeable future?
I looked in Zotero's FAQ. They seem to be speaking to users, not to webmasters. I am looking for advice for webmasters.
I was looking for the specifications which Wikipedia citation lookup services use to generate citations. Like for example, if I input a New York Times URL, then somehow the tool knows that NYT has specified title, author, date, and other fields. Many other less-developed websites do not give this information the wiki lookup tool. I want to know what NYT is doing, or rather, where the authoritative web recommendations are which specify what websites should do to be machine readable. I want there to be a Wikipedia recommendation page for webmasters telling them what to do.
If the answer is "there seems to be no discussion of this for MediaWiki / Wikimedia", then that is useful information. Is Dublin Core the recommendation which Wikipedia editors should give to the world's web developers in Wikimedia documentation for how they should maximize their compatibility with this platform and receive their best citations?
thanks Blue Rasberry (talk) 21:08, 23 November 2021 (UTC)Reply
This sort of thing used to be handled by citoid, now it's handled by Zotero translation-server directly. If not handled by a specific website translator (as with the new york times - full list is at https://github.com/zotero/translators) the website is handled by the embedded metadata translator (https://github.com/zotero/translators/blob/master/Embedded%20Metadata.js) which has support for highwire metadata, as well as the following ontologies:
bib: "http://purl.org/net/biblio#",
bibo: "http://purl.org/ontology/bibo/",
dc: "http://purl.org/dc/elements/1.1/",
dcterms: "http://purl.org/dc/terms/",
prism: "http://prismstandard.org/namespaces/1.2/basic/",
foaf: "http://xmlns.com/foaf/0.1/",
vcard: "http://nwalsh.com/rdf/vCard#",
link: "http://purl.org/rss/1.0/modules/link/",
z: "http://www.zotero.org/namespaces/export#",
eprint: "http://purl.org/eprint/terms/",
eprints: "http://purl.org/eprint/terms/",
og: "http://ogp.me/ns#", // Used for Facebook's OpenGraph Protocol
article: "http://ogp.me/ns/article#",
book: "http://ogp.me/ns/book#",
music: "http://ogp.me/ns/music#",
video: "http://ogp.me/ns/video#",
so: "http://schema.org/",
codemeta: "https://codemeta.github.io/terms/",
rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
And then as a fallback some lower quality metadata.
Of the available ones, I think from a webmaster's perspective eprints would give the highest quality results because this standard is really designed for citations and most closely matches zotero's internal standard and would be good for journal articles, newspaper articles, and websites; by contrast dublin core is more common but doesn't always map that nicely. For music and video facebook's open graph metadata standard might be better but I'm not really sure.
In terms of what format the metadata should be included in the page in, including the metadata tags in the html itself is safest. Zotero has support for rdf but citoid doesn't; citoid has support for json-ld but Zotero doesn't. Mvolz (WMF) (talk) 09:16, 24 November 2021 (UTC)Reply
Hi, @Mvolz (WMF)!
You say that "citoid has support for json-ld but Zotero doesn't". I'd appreciate it if you could elaborate on this, please.
I see that citoid's Scraper uses (here) html-metadata lib's parseAll function which does support JSON-LD; it returns a promise that resolves to a metadata object with a jsonLd property.
However, this metadata object is passed to the matchIDs function (here), which does not seem to use this jsonLd property.
The metadata object is then passed to the addMetadata function (here), and inside it to the addItemType function (here), none of which seem to use its jsonLd property either.
Finally, the data in the jsonLd property doesn't show in the final citoid response (see T270816).
Am I missing something? Thanks! Diegodlh (talk) 04:18, 24 February 2022 (UTC)Reply
For a while we used citoid's native translator for a lot of websites, but at some point we switched to using zotero for everything unless zotero fails/goes down. So since the zotero translator doesn't support json-ld it won't show up unless zotero fails (which happens rarely). The issue is tracked in zotero here:
https://github.com/zotero/translators/issues/917 Mvolz (WMF) (talk) 12:30, 24 February 2022 (UTC)Reply
Thanks, Marielle. I switched Zotero translation off in my local citoid instance (setting `zotero` to `false` in the `config.yaml` file), but although the output citation changed (as expected) the JSON-LD is still not present.
I revised the `addMetadata` function in the `Scraper` module and I still don't see where the `html-metadata`'s `jsonLd` is being used. I see there are custom (citoid) translators for highwire, bepress, opengraph, etc, but I don't see a translator for JSON-LD.
Am I missing something? Diegodlh (talk) 15:51, 24 February 2022 (UTC)Reply
Which website are you scraping? We only parse json-ld if it's in the html, if it's in a linked file it won't get scraped. Mvolz (WMF) (talk) 13:47, 23 March 2022 (UTC)Reply
Hi, @Mvolz (WMF). Sorry for the delay.
See for example https://www.perlego.com/book/1431388/qualitative-research-practice-a-guide-for-social-science-students-and-researchers-pdf?queryID=8d25693afbbc254b9927e5d0f7dac19f&searchIndexType=books.
The correct item type (book) and author names are available in one of the JSON-LD objects, and are in fact available in the metadata object returned by html-metadata's parseAll (see my original comment above).
However, Citoid (with Zotero turned off) returns a wrong item type (webpage) and no author names. Diegodlh (talk) 14:37, 12 April 2022 (UTC)Reply
As a provider of HTML documents I would offer multiple general metadata simultaneously.
and more.
Wow thanks this is what I wanted. This is a bit heavy for me so I will read and think for a while. Thanks a lot for the answers, opinions, and the links. Blue Rasberry (talk) 15:47, 24 November 2021 (UTC)Reply