Talk:Citoid

About this board

Previous archives are at /Archive 1

"but also may be accessible from the wikicode editing palette, if activated."

4
Jasonkhanlar (talkcontribs)

"but also may be accessible from the wikicode editing palette, if activated."


How? Are there steps/instructions on how to use Citoid without VisualEditor extension? I can't use VisualEditor because my shared hosting web provider does not allow "AllowEncodedSlashes NoDecode" because that directive cannot be set in .htaccess, and I do not have access to server config or virtualhost config as referenced in httpd.apache.org/docs/2.4/mod/core.html#allowencodedslashes

PerfektesChaos (talkcontribs)

I know three approaches: “2017” source text editing toolbar (which is using VisualEditor internally), TemplateWizard available via source editing toolbar “2010”, and citoidWikitext@PerfektesChaos.

Diegodlh (talkcontribs)

@PerfektesChaos, I'm not sure what you mean with approaches (1) “2017” source text editing toolbar, and (2) TemplateWizard available via source editing toolbar “2010”. Could you elaborate further?

There is yet another approach available: using Wikipedia's ProveIt gadget.

Whatamidoing (WMF) (talkcontribs)
Reply to ""but also may be accessible from the wikicode editing palette, if activated.""

website specifications for citations

11
Bluerasberry (talkcontribs)

Where are the specifications for what websites need to do to enable citoid to generate citations from URLs?

I presume these are wc3 specifications and have to do with html tags, but what documentation do we use to explain what website operators need to do to make themselves aligned with citoid? Is there anything in the Wikimedia platform?

I am imagining tags like author, title, date, etc. Where is the full list of what citoid takes and instructions.

PerfektesChaos (talkcontribs)

I suggest you read en:Zotero as introduction and continue by crawling zotero.org, somewhere there is the official documentation, since you ask for a full list.

There are known websites where information can be retrieved individually, and if unknown then general methods like en:Dublin Core will be tried.

Bluerasberry (talkcontribs)

I might be lost. I was expecting Wikimedia documentation, and I think you are suggesting that compliance with Zotero is equivalent to compliance with Wikipedia. Is this the case, and will this be the case for the foreseeable future?

I looked in Zotero's FAQ. They seem to be speaking to users, not to webmasters. I am looking for advice for webmasters.

I was looking for the specifications which Wikipedia citation lookup services use to generate citations. Like for example, if I input a New York Times URL, then somehow the tool knows that NYT has specified title, author, date, and other fields. Many other less-developed websites do not give this information the wiki lookup tool. I want to know what NYT is doing, or rather, where the authoritative web recommendations are which specify what websites should do to be machine readable. I want there to be a Wikipedia recommendation page for webmasters telling them what to do.

If the answer is "there seems to be no discussion of this for MediaWiki / Wikimedia", then that is useful information. Is Dublin Core the recommendation which Wikipedia editors should give to the world's web developers in Wikimedia documentation for how they should maximize their compatibility with this platform and receive their best citations?

thanks

Mvolz (WMF) (talkcontribs)

This sort of thing used to be handled by citoid, now it's handled by Zotero translation-server directly. If not handled by a specific website translator (as with the new york times - full list is at https://github.com/zotero/translators) the website is handled by the embedded metadata translator (https://github.com/zotero/translators/blob/master/Embedded%20Metadata.js) which has support for highwire metadata, as well as the following ontologies:

bib: "http://purl.org/net/biblio#", bibo: "http://purl.org/ontology/bibo/", dc: "http://purl.org/dc/elements/1.1/", dcterms: "http://purl.org/dc/terms/", prism: "http://prismstandard.org/namespaces/1.2/basic/", foaf: "http://xmlns.com/foaf/0.1/", vcard: "http://nwalsh.com/rdf/vCard#", link: "http://purl.org/rss/1.0/modules/link/", z: "http://www.zotero.org/namespaces/export#", eprint: "http://purl.org/eprint/terms/", eprints: "http://purl.org/eprint/terms/", og: "http://ogp.me/ns#", // Used for Facebook's OpenGraph Protocol article: "http://ogp.me/ns/article#", book: "http://ogp.me/ns/book#", music: "http://ogp.me/ns/music#", video: "http://ogp.me/ns/video#", so: "http://schema.org/", codemeta: "https://codemeta.github.io/terms/", rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"


And then as a fallback some lower quality metadata.

Of the available ones, I think from a webmaster's perspective eprints would give the highest quality results because this standard is really designed for citations and most closely matches zotero's internal standard and would be good for journal articles, newspaper articles, and websites; by contrast dublin core is more common but doesn't always map that nicely. For music and video facebook's open graph metadata standard might be better but I'm not really sure.

In terms of what format the metadata should be included in the page in, including the metadata tags in the html itself is safest. Zotero has support for rdf but citoid doesn't; citoid has support for json-ld but Zotero doesn't.

Diegodlh (talkcontribs)

Hi, @Mvolz (WMF)!

You say that "citoid has support for json-ld but Zotero doesn't". I'd appreciate it if you could elaborate on this, please.

I see that citoid's Scraper uses (here) html-metadata lib's parseAll function which does support JSON-LD; it returns a promise that resolves to a metadata object with a jsonLd property.

However, this metadata object is passed to the matchIDs function (here), which does not seem to use this jsonLd property.

The metadata object is then passed to the addMetadata function (here), and inside it to the addItemType function (here), none of which seem to use its jsonLd property either.

Finally, the data in the jsonLd property doesn't show in the final citoid response (see T270816).

Am I missing something? Thanks!

Mvolz (WMF) (talkcontribs)

For a while we used citoid's native translator for a lot of websites, but at some point we switched to using zotero for everything unless zotero fails/goes down. So since the zotero translator doesn't support json-ld it won't show up unless zotero fails (which happens rarely). The issue is tracked in zotero here: https://github.com/zotero/translators/issues/917

Diegodlh (talkcontribs)

Thanks, Marielle. I switched Zotero translation off in my local citoid instance (setting `zotero` to `false` in the `config.yaml` file), but although the output citation changed (as expected) the JSON-LD is still not present.

I revised the `addMetadata` function in the `Scraper` module and I still don't see where the `html-metadata`'s `jsonLd` is being used. I see there are custom (citoid) translators for highwire, bepress, opengraph, etc, but I don't see a translator for JSON-LD.

Am I missing something?

Mvolz (WMF) (talkcontribs)

Which website are you scraping? We only parse json-ld if it's in the html, if it's in a linked file it won't get scraped.

Diegodlh (talkcontribs)

Hi, @Mvolz (WMF). Sorry for the delay.

See for example https://www.perlego.com/book/1431388/qualitative-research-practice-a-guide-for-social-science-students-and-researchers-pdf?queryID=8d25693afbbc254b9927e5d0f7dac19f&searchIndexType=books.

The correct item type (book) and author names are available in one of the JSON-LD objects, and are in fact available in the metadata object returned by html-metadata's parseAll (see my original comment above).

However, Citoid (with Zotero turned off) returns a wrong item type (webpage) and no author names.

PerfektesChaos (talkcontribs)

As a provider of HTML documents I would offer multiple general metadata simultaneously.

and more.

  • They do not cause conflicts since they have separate naming schemes.
  • Leave it to the audience and let them pick up what they understand.
  • Zotero etc. have some heuristics and will make their choice.
Bluerasberry (talkcontribs)

Wow thanks this is what I wanted. This is a bit heavy for me so I will read and think for a while. Thanks a lot for the answers, opinions, and the links.

Reply to "website specifications for citations"
InnerCitadel (talkcontribs)

Where is config.yaml located? It's not in the citoid extension.

Mvolz (WMF) (talkcontribs)

That portion is in the directions for installing the service, not the extension.

InnerCitadel (talkcontribs)

It's in /opt/citoid/config.yaml

Reply to "Config.yaml"
InnerCitadel (talkcontribs)

I'm running my small wiki on a VPS with 2 VCPU and 4GB ram. The zotero and citoid services are each using about 45% of total CPU. Meaning 95% of CPU is being utilised and it is slowing down the site. Any ideas? Is this expected behaviour and it just needs a very beefy server?

Whatamidoing (WMF) (talkcontribs)

Are you running Chrome?

InnerCitadel (talkcontribs)

It was starting and quitting repeatedly. Fixed now.

Whatamidoing (WMF) (talkcontribs)

I'm glad to hear that it's working now. Feel free to come back if the problem reappears.

Reply to "Very high CPU usage"
Daniel Mietchen (talkcontribs)

I'd like to be able to use Citoid's "cite from ID" functionality to fill in statements on Wikidata items about references with Citoid-covered IDs. Is something like this on the horizon somewhere, possibly through a gadget?

Jdforrester (WMF) (talkcontribs)

It's certainly something that could be built. I'm not sure it's a great area to focus on, until we have actual structured citations on WIkidata; right now you're meant to just add a reference with type=URI and the link – there's not really anywhere for the content that Citoid fetches to go…

Mvolz (talkcontribs)

Actually, that's not exactly true; it's convention to just add the URI because that's the easiest way to do it. But there is support for structured citations, see claim "original language of work" in wikidata:Q43361. There you can see that the reference is "in" a wikidata item, a book, which has its own metadata, and then there are other properties you can annotate the citation with, like page number and quote, etc.

That said, it is easily more complicated than what is currently being done with citoid/VE or citoid/wikitext because of course, it requires you to both a) search wikidata and b) create new items on wikidata which is already more complicated than VE/wikitext which essentially amounts to just inserting text. And it also is a more complicated language, because each reference has to be annotated with potentially specific fields depending on the kind of citation; we have to know which fields go in the wikidata item the reference is "in" (such as volume) and which fields get annotated directly onto the reference (such as page number), and we have to do that with each citoid type (which are just zotero types- see incomplete map between zotero and wikidata types here: wikidata:Wikidata:WikiProject_Source_MetaData/Source_types).

Although I agree with James that it's probably not *immediately* in the pipeline since there is more low-hanging fruit before we get there.

Jdforrester (WMF) (talkcontribs)

Oh, sure, you can do that in Wikidata, but only if the item you're referencing already exists; there doesn't appear to be community appetite just yet for auto-creating items just to serve as reference points.

He7d3r (talkcontribs)

Doesn't?

Jdforrester (WMF) (talkcontribs)

Happy to be proven wrong if you have a link. :-)

Mvolz (WMF) (talkcontribs)

https://www.wikidata.org/wiki/Help:Sources#Adding_a_source_to_a_statement

"Add the source as an item if: i) it's not in Wikidata already and ii) it is not a webpage"

So if citoid retrieves information about a book, newspaper article, or journal article, both the journal article itself and the journal it's published in should be added as items in wikidata, as see: https://www.wikidata.org/wiki/Help:Sources#Scientific.2C_newspaper_or_magazine_article

If the item is a webpage, then only the reference URL should be inserted, but an item for the particular website (publisher) itself should not be unless it is already in wikidata. Date and access date can be added: https://www.wikidata.org/wiki/Help:Sources#Web_page

I can see an issue arising where if citoid miscategorises a blog as a newspaper, this could conceivably contribute to "junk" being in wikidata. Right now though it's very "cautious" in this regard; currently only results coming from zotero are classified as newspapers or journals, where a human has assessed this. We currently assign urls where no zotero translators exist that have the open graph type "article" to itemType "blogPost" even though plausibly these could be news or journal articles as well.

Jdforrester (WMF) (talkcontribs)
Mvolz (WMF) (talkcontribs)

I believe User: Daniel Mietchen's question was about using citoid to create references for claims in Wikidata, which is what I was responding to. Is that not the case?

Jdforrester (WMF) (talkcontribs)

Sorry, you're right, I lost track of the parallel conversations. :-)

He7d3r (talkcontribs)

Could you clarify what you mean by "reference points"?

So9q (talkcontribs)

Interesting discussion. I would like Citoid to support Wikidata and creating new items based on identifiers they don't exist already.

Reply to "Citoid on Wikidata?"
DDPAT 2.0 (talkcontribs)

Why is most of the text not marked for translation?

Mvolz (WMF) (talkcontribs)
Reply to "Translation"
Juandev (talkcontribs)

What are the source databases for generation references from DOI, ISBN and ISSN?

PerfektesChaos (talkcontribs)
  • For DOI, it is the resolved URL itself.
  • For ISBN I guess it is WorldCat. You may find the evaluation here.

ISSN has no resolver since a series of journal issues over decades is not a single article as subject for citation.

The available codes are: DOI ISBN PMC PMID.

Reply to "DOI, ISBN and ISSN sources"

Adding new fields after automatic generation

3
Summary by Whatamidoing (WMF)
Eviolite (talkcontribs)

Otfentimes after attempting to use the "automatic" feature to generate a citation, the output is missing several fields that can be filled (most commonly authors and dates). It would be helpful to be able to manually add these fields back in in an UI after the generation as well. In VE you can do it by inserting, and then clicking Edit on the generated template, but this is a bit clunky and when doing source editing in the 2017 wikitext editor it is impossible. Would it be possible to be able to have another step after the generation to "fill in the blanks" or at least edit them after-the-fact in 2017 wikitext? Thanks!

Whatamidoing (WMF) (talkcontribs)

If memory serves, someone else suggested this several years ago. I'll see if I can find the phab: number. I don't expect the WMF's Editing team to work on this tool for a while (they're focused on Talk pages project this year), so it may be a long time before we get what we want.

Eviolite (talkcontribs)
Reply to "Adding new fields after automatic generation"

No spaces in citation output

2
Summary by Lostraven

Someone commented on the Phabricator ticket and made me aware of these custom format settings in Extension:TemplateData. Wow! Huge help. However, I feel like this information is massively lacking in the Citoid documentation, and I'm going to go add that information now.

Lostraven (talkcontribs)

It drives me absolutely bonkers that the output doesn't put a space before every pipe / after the field content in the citation it generates.

For example, generated as: {{Cite journal|last=Forsman|first=R W|date=1996-05-01|title=Why is the laboratory an afterthought for managed care organizations?|url=https://academic.oup.com/clinchem/article/42/5/813/5646564|journal=Clinical Chemistry|language=en|volume=42|issue=5|pages=813–816|doi=10.1093/clinchem/42.5.813|issn=0009-9147}}

I want it to output as: {{Cite journal |last=Forsman |first=R W |date=1996-05-01 |title=Why is the laboratory an afterthought for managed care organizations? |url=https://academic.oup.com/clinchem/article/42/5/813/5646564 |journal=Clinical Chemistry |language=en |volume=42 |issue=5 |pages=813–816 |doi=10.1093/clinchem/42.5.813 |issn=0009-9147}}


What file or files in the Citoid extension must I modify in order to add a space before every pipe / after field text?

EDIT: I'll add that I do 99 percent of the entry on our wiki, and I almost exclusively work in the source editor. The only reason I added Citoid is because of years and years of manually typing citations out. The bummer, of course, is that the Automatic feature of Citoid only works in VisualEditor, not in the source editing screen. But the time savings of automatically generating citations was tantalizing. Yet because most of my work is in source editing, I still want slightly better readability of the generated citations, and thus I want to add a space before every pipe. Hopefully this is a relatively easy thing for me to modify?


Thank you.

Lostraven (talkcontribs)

After a lot of digging, I found a Phabricator ticket from 2020. Does Citoid not manage the output to the text editor? Or is it VisualEditor, as this ticket suggests? Digging further.

A visual Zotero/Citoid translator editor?

4
Diegodlh (talkcontribs)

Hi all! I'm thinking of applying for a software grant to develop a visual Zotero/Citoid translator editor. Before writing the proposal, I would really appreciate your feedback about the idea, as summarized here . Thanks!

Diegodlh (talkcontribs)

A proposal has been presented here. We would appreciate your thoughts, comments and questions in its discussion page, as well as your endorsements if you would like to support it. Thank you!

Whatamidoing (WMF) (talkcontribs)

In principle, it sounds like a great idea to me. @Czar could probably give you more useful advice.

Diegodlh (talkcontribs)

Scann and I are happy to announce that our grant proposal has been approved!! We are now putting together an Advisory Board to help us think and tackle critical aspects of the project, and build sustainability and community involvement. We will be thrilled to have anyone from the Citoid team (or interested in Citoid) onboard! Applications are open here. Thank you!!

Reply to "A visual Zotero/Citoid translator editor?"