Talk:Citoid/2021
Add topic| This page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made. |
Previous archives are at /Archive 1
Nodejs version for citoid
[edit]Helloǃ While following the instructions from the Citoid page, I discovered that the npm install step for citoid does not work with my current node installation. I'm using the 12.20 version from nodesource PPA on Ubuntu 20.04 and I get errors like
gyp ERR! command "/usr/bin/node" "/usr/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild" gyp ERR! cwd /home/giuseppe/citoid/citoid/node_modules/heapdump gyp ERR! node -v v12.20.1 gyp ERR! node-gyp -v v5.1.0 gyp ERR! not ok
While instead if I perform this step on a ubuntu docker instance with nodejs from standard repo (v10.19), it works fine. Do you have any suggestion on how to overcome this issue without changing the installed version of node? Thanksǃ (PSː I posted this question in @Mvolz (WMF)'s talk page fiew days ago). Baruneju (talk) 21:58, 23 January 2021 (UTC)
- Hello, unfortunately we only support 10 as that's what we run in production - I don't really know much about your production environment but you can use nvm to run multiple versions of node on the same machine- not sure this helps! Mvolz (WMF) (talk) 15:30, 28 January 2021 (UTC)
- (I've updated the docs now though, thanks for pointing this out!) Mvolz (WMF) (talk) 15:31, 28 January 2021 (UTC)
- Thank you, I'm not quite familiar with the node ecosystem, but I'll git it a tryǃ My production environment is luckily my production environment is still clear so I'm going to have this issue only on the development machine. Keep up the good workǃ Baruneju (talk) 00:09, 1 February 2021 (UTC)
A visual Zotero/Citoid translator editor?
[edit]- Hi all! I'm thinking of applying for a software grant [1] to develop a visual Zotero/Citoid translator editor. Before writing the proposal, I would really appreciate your feedback about the idea, as summarized here [2]. Thanks! Diegodlh (talk) 00:35, 9 March 2021 (UTC)
- A proposal has been presented here. We would appreciate your thoughts, comments and questions in its discussion page, as well as your endorsements if you would like to support it. Thank you! Diegodlh (talk) 22:47, 16 March 2021 (UTC)
- In principle, it sounds like a great idea to me. @Czar could probably give you more useful advice. Whatamidoing (WMF) (talk) 22:13, 10 March 2021 (UTC)
- Scann and I are happy to announce that our grant proposal has been approved!! We are now putting together an Advisory Board to help us think and tackle critical aspects of the project, and build sustainability and community involvement. We will be thrilled to have anyone from the Citoid team (or interested in Citoid) onboard! Applications are open here. Thank you!! Diegodlh (talk) 00:01, 23 July 2021 (UTC)
Running as a separate service
[edit]Has anyone figured out if its possible to run this in a separate "location" to the main mediawiki installation like how it was possible with the older nodejs parsoid and have them link to each other? I run my wiki on a managed server in virtual docker containers. It's not possible to install nodejs on the same container as the main apache container and so I would have to run it in a separate container. I know many people had similar problems with VisualEditor and the older javascript parsoid. InnerCitadel (talk) InnerCitadel (talk) 10:58, 1 May 2021 (UTC)
Wikilinking in citoid?
[edit]How far are we from the possibility of Citoid being able to automatically include wikilinks to publishers that have a page? (I realize consensus on whether we ought to do so would need to be established, but I'm wondering here more about the technical aspect. Personally, I think we should.) Sdkb talk 03:38, 7 July 2021 (UTC)
- There's a phabricator ticket for it here, with a brief discussion of the technical aspects: https://phabricator.wikimedia.org/T212112
- I would say, personally, pretty far. From reading the page above, I see you've also encountered the problem of ambiguous publisher names! I think one way to disambiguate names might be to use wikidata (which have "synonyms" and you can use the instance of or subclass properties to make sure you've got the right sort of item as well) but actually doing this well enough for it to be reliable so it can be done automatically and not require using user input would involve quite a substantial change to how citoid works in the back end (which right now is a glorified webscraper).
- Though there are front end approaches too, which might be better, at least for Citoid in Visual Editor. Mvolz (WMF) (talk) 07:31, 7 July 2021 (UTC)
No spaces in citation output
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
It drives me absolutely bonkers that the output doesn't put a space before every pipe / after the field content in the citation it generates.
For example, generated as: {{Cite journal|last=Forsman|first=R W|date=1996-05-01|title=Why is the laboratory an afterthought for managed care organizations?|url=https://academic.oup.com/clinchem/article/42/5/813/5646564|journal=Clinical Chemistry|language=en|volume=42|issue=5|pages=813–816|doi=10.1093/clinchem/42.5.813|issn=0009-9147}}
I want it to output as: {{Cite journal |last=Forsman |first=R W |date=1996-05-01 |title=Why is the laboratory an afterthought for managed care organizations? |url=https://academic.oup.com/clinchem/article/42/5/813/5646564 |journal=Clinical Chemistry |language=en |volume=42 |issue=5 |pages=813–816 |doi=10.1093/clinchem/42.5.813 |issn=0009-9147}}
What file or files in the Citoid extension must I modify in order to add a space before every pipe / after field text?
EDIT: I'll add that I do 99 percent of the entry on our wiki, and I almost exclusively work in the source editor. The only reason I added Citoid is because of years and years of manually typing citations out. The bummer, of course, is that the Automatic feature of Citoid only works in VisualEditor, not in the source editing screen. But the time savings of automatically generating citations was tantalizing. Yet because most of my work is in source editing, I still want slightly better readability of the generated citations, and thus I want to add a space before every pipe. Hopefully this is a relatively easy thing for me to modify?
Thank you. Lostraven (talk) 17:51, 4 August 2021 (UTC)
- After a lot of digging, I found a Phabricator ticket from 2020. Does Citoid not manage the output to the text editor? Or is it VisualEditor, as this ticket suggests? Digging further. Lostraven (talk) 19:44, 4 August 2021 (UTC)
Adding new fields after automatic generation
[edit]Otfentimes after attempting to use the "automatic" feature to generate a citation, the output is missing several fields that can be filled (most commonly authors and dates). It would be helpful to be able to manually add these fields back in in an UI after the generation as well. In VE you can do it by inserting, and then clicking Edit on the generated template, but this is a bit clunky and when doing source editing in the 2017 wikitext editor it is impossible. Would it be possible to be able to have another step after the generation to "fill in the blanks" or at least edit them after-the-fact in 2017 wikitext? Thanks! Eviolite (talk) 04:05, 8 September 2021 (UTC)
- If memory serves, someone else suggested this several years ago. I'll see if I can find the phab: number. I don't expect the WMF's Editing team to work on this tool for a while (they're focused on Talk pages project this year), so it may be a long time before we get what we want. Whatamidoing (WMF) (talk) 19:33, 8 September 2021 (UTC)
- Izno in the Wikimedia community discord server found it, it's phab:T174585. Apologies for not replying with it earlier, but thanks for the update. Eviolite (talk) 19:39, 8 September 2021 (UTC)
DOI, ISBN and ISSN sources
[edit]What are the source databases for generation references from DOI, ISBN and ISSN? Juandev (talk) 10:25, 14 October 2021 (UTC)
- ISSN has no resolver since a series of journal issues over decades is not a single article as subject for citation.
- The available codes are: DOI ISBN PMC PMID.
- All resolvers are listed at phab:diffusion/GZTT. PerfektesChaos (talk) 14:19, 14 October 2021 (UTC)
Translation
[edit]Why is most of the text not marked for translation? DDPAT (talk) 20:32, 20 November 2021 (UTC)
- Citoid/Enabling_Citoid_on_your_wiki has a subset of the information and has better translation coverage. Mvolz (WMF) (talk) 09:35, 24 November 2021 (UTC)
website specifications for citations
[edit]Where are the specifications for what websites need to do to enable citoid to generate citations from URLs?
I presume these are wc3 specifications and have to do with html tags, but what documentation do we use to explain what website operators need to do to make themselves aligned with citoid? Is there anything in the Wikimedia platform?
I am imagining tags like author, title, date, etc. Where is the full list of what citoid takes and instructions. Blue Rasberry (talk) 22:39, 21 November 2021 (UTC)
- I suggest you read en:Zotero as introduction and continue by crawling zotero.org, somewhere there is the official documentation, since you ask for a full list.
- There are known websites where information can be retrieved individually, and if unknown then general methods like en:Dublin Core will be tried. PerfektesChaos (talk) 16:49, 23 November 2021 (UTC)
- I might be lost. I was expecting Wikimedia documentation, and I think you are suggesting that compliance with Zotero is equivalent to compliance with Wikipedia. Is this the case, and will this be the case for the foreseeable future?
- I looked in Zotero's FAQ. They seem to be speaking to users, not to webmasters. I am looking for advice for webmasters.
- I was looking for the specifications which Wikipedia citation lookup services use to generate citations. Like for example, if I input a New York Times URL, then somehow the tool knows that NYT has specified title, author, date, and other fields. Many other less-developed websites do not give this information the wiki lookup tool. I want to know what NYT is doing, or rather, where the authoritative web recommendations are which specify what websites should do to be machine readable. I want there to be a Wikipedia recommendation page for webmasters telling them what to do.
- If the answer is "there seems to be no discussion of this for MediaWiki / Wikimedia", then that is useful information. Is Dublin Core the recommendation which Wikipedia editors should give to the world's web developers in Wikimedia documentation for how they should maximize their compatibility with this platform and receive their best citations?
- thanks Blue Rasberry (talk) 21:08, 23 November 2021 (UTC)
- This sort of thing used to be handled by citoid, now it's handled by Zotero translation-server directly. If not handled by a specific website translator (as with the new york times - full list is at https://github.com/zotero/translators) the website is handled by the embedded metadata translator (https://github.com/zotero/translators/blob/master/Embedded%20Metadata.js) which has support for highwire metadata, as well as the following ontologies:
- bib: "http://purl.org/net/biblio#",
- bibo: "http://purl.org/ontology/bibo/",
- dc: "http://purl.org/dc/elements/1.1/",
- dcterms: "http://purl.org/dc/terms/",
- prism: "http://prismstandard.org/namespaces/1.2/basic/",
- foaf: "http://xmlns.com/foaf/0.1/",
- vcard: "http://nwalsh.com/rdf/vCard#",
- link: "http://purl.org/rss/1.0/modules/link/",
- z: "http://www.zotero.org/namespaces/export#",
- eprint: "http://purl.org/eprint/terms/",
- eprints: "http://purl.org/eprint/terms/",
- og: "http://ogp.me/ns#", // Used for Facebook's OpenGraph Protocol
- article: "http://ogp.me/ns/article#",
- book: "http://ogp.me/ns/book#",
- music: "http://ogp.me/ns/music#",
- video: "http://ogp.me/ns/video#",
- so: "http://schema.org/",
- codemeta: "https://codemeta.github.io/terms/",
- rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
- And then as a fallback some lower quality metadata.
- Of the available ones, I think from a webmaster's perspective eprints would give the highest quality results because this standard is really designed for citations and most closely matches zotero's internal standard and would be good for journal articles, newspaper articles, and websites; by contrast dublin core is more common but doesn't always map that nicely. For music and video facebook's open graph metadata standard might be better but I'm not really sure.
- In terms of what format the metadata should be included in the page in, including the metadata tags in the html itself is safest. Zotero has support for rdf but citoid doesn't; citoid has support for json-ld but Zotero doesn't. Mvolz (WMF) (talk) 09:16, 24 November 2021 (UTC)
- Hi, @Mvolz (WMF)!
- You say that "citoid has support for json-ld but Zotero doesn't". I'd appreciate it if you could elaborate on this, please.
- I see that citoid's
Scraperuses (here) html-metadata lib'sparseAllfunction which does support JSON-LD; it returns a promise that resolves to ametadataobject with ajsonLdproperty. - However, this
metadataobject is passed to thematchIDsfunction (here), which does not seem to use thisjsonLdproperty. - The
metadataobject is then passed to theaddMetadatafunction (here), and inside it to theaddItemTypefunction (here), none of which seem to use itsjsonLdproperty either. - Finally, the data in the
jsonLdproperty doesn't show in the final citoid response (see T270816). - Am I missing something? Thanks! Diegodlh (talk) 04:18, 24 February 2022 (UTC)
- For a while we used citoid's native translator for a lot of websites, but at some point we switched to using zotero for everything unless zotero fails/goes down. So since the zotero translator doesn't support json-ld it won't show up unless zotero fails (which happens rarely). The issue is tracked in zotero here:
- https://github.com/zotero/translators/issues/917 Mvolz (WMF) (talk) 12:30, 24 February 2022 (UTC)
- Thanks, Marielle. I switched Zotero translation off in my local citoid instance (setting `zotero` to `false` in the `config.yaml` file), but although the output citation changed (as expected) the JSON-LD is still not present.
- I revised the `addMetadata` function in the `Scraper` module and I still don't see where the `html-metadata`'s `jsonLd` is being used. I see there are custom (citoid) translators for highwire, bepress, opengraph, etc, but I don't see a translator for JSON-LD.
- Am I missing something? Diegodlh (talk) 15:51, 24 February 2022 (UTC)
- Which website are you scraping? We only parse json-ld if it's in the html, if it's in a linked file it won't get scraped. Mvolz (WMF) (talk) 13:47, 23 March 2022 (UTC)
- Hi, @Mvolz (WMF). Sorry for the delay.
- See for example https://www.perlego.com/book/1431388/qualitative-research-practice-a-guide-for-social-science-students-and-researchers-pdf?queryID=8d25693afbbc254b9927e5d0f7dac19f&searchIndexType=books.
- The correct item type (book) and author names are available in one of the JSON-LD objects, and are in fact available in the metadata object returned by html-metadata's parseAll (see my original comment above).
- However, Citoid (with Zotero turned off) returns a wrong item type (webpage) and no author names. Diegodlh (talk) 14:37, 12 April 2022 (UTC)
- As a provider of HTML documents I would offer multiple general metadata simultaneously.
- and more.
- They do not cause conflicts since they have separate naming schemes.
- Leave it to the audience and let them pick up what they understand.
- Zotero etc. have some heuristics and will make their choice. PerfektesChaos (talk) 15:18, 24 November 2021 (UTC)
- Wow thanks this is what I wanted. This is a bit heavy for me so I will read and think for a while. Thanks a lot for the answers, opinions, and the links. Blue Rasberry (talk) 15:47, 24 November 2021 (UTC)