Extension talk:Cargo

From MediaWiki.org
Jump to navigation Jump to search

Error when used with Approved Revs[edit source]

Hello, I have "Approved Revs" installed and when hitting "Approve" for an edit I get an error: Notice: Undefined variable: cdb in /opt/bitnami/apps/mediawiki/htdocs/extensions/Cargo/Cargo.hooks.php on line 245
Using:
MediaWiki: 1.33.1, Approved Revs: 1.2, Cargo: 2.3.1
If Cargo is disabled the "Approved Revs" extension works as expected. Is this a configuration issue or something else?--Greg Hitchon (talk) 15:52, 18 November 2019 (UTC)

It seems like if you add $cdb = CargoUtils::getDB(); to the onARRevisionApproved function it resolves the issue.--Greg Hitchon (talk) 16:04, 18 November 2019 (UTC)
However this still seems like not the correct fix as when approving a page it gets deleted from the Cargo table. Any advice welcome! --Greg Hitchon (talk) 16:50, 18 November 2019 (UTC)
Sorry about the first problem - that's a bug from September. I just checked in a fix for it. I don't know what's causing that second problem, though. It doesn't happen for me - with this fix in place, Cargo data seems to get set correctly after every approval (and unapproval). Does the problem happen for you consistently? Yaron Koren (talk) 17:58, 18 November 2019 (UTC)
Thanks for the quick fix!!! Maybe it is due to having $egApprovedRevsAutomaticApprovals = false;. If an approved rev exists and a new edit is made the "approved" cargo data is removed. The expected behaviour in this case would be for nothing to happen.--Greg Hitchon (talk) 19:03, 18 November 2019 (UTC)
Sorry for the multiple edits here (new to this so not sure of the etiquette/if this is the right place for this convo) Tried with $egApprovedRevsAutomaticApprovals = true; and had a similar result of the Cargo data being removed upon save. It is properly saved if I 'Approve' another revision --Greg Hitchon (talk) 19:11, 18 November 2019 (UTC)
The edits here are fine. And thanks for clarifying that the problem came about when doing an edit after an approval, not just the approval itself. I was indeed able to replicate that problem, and it turned out to be a bug due to a change in the Approved Revs code about a year ago. (I guess the combination of Approved Revs + Cargo is still not that popular, since no one reported this problem until now.) I just checked in another fix to the Cargo code - hopefully everything works now. Please let me know if there are any more problems. Yaron Koren (talk) 20:23, 18 November 2019 (UTC)
Thanks again for the quick fixes here! The update has fixed that issue generally however one reproducible issue is:
  1. Start with a page with an Approved Rev ($egApprovedRevsAutomaticApprovals = false;)
  2. Make an edit (after the edit everything is fine)
  3. Approve the latest approval (any other approval seems to work)
  4. Cargo data is deleted
Little disconcerting that no one has reported the issue in 1+ years. It seems like those extensions fulfill our business needs perfectly (thanks for the work on them) so really want to get it sorted out. I can spend some time getting familiar with the code but am new to the php/wiki world so not sure how much help I will be off the bat here.--Greg Hitchon (talk) 01:17, 19 November 2019 (UTC)
Looked into this a little further. It looks like in a few cases the parser is not running after the options are set and so skipping adding data. If you manually trigger these then it seems to work:
  1. onARRevisionApproved: Add CargoUtils::parsePageForStorage($title,null); after the "CargoStore::settings..." line
  2. onARRevisionUnapproved: Add CargoUtils::parsePageForStorage($title,$parser->mText); after the "CargoStore::settings..." line
Hope this helps! --Greg Hitchon (talk) 16:25, 19 November 2019 (UTC)

Yes, it's strange that these bugs had not been caught (or at least reported) until now - Cargo and Approved Revs both have significant usage. Maybe not together, though; I don't know. Anyway, sorry about that problem. And good sleuthing! I just checked in what I think is a fix for this bug, based in part on your suggested change. Yaron Koren (talk) 20:47, 19 November 2019 (UTC)

Great thanks again! Tested and can confirm this fixes the issue. Thought there was a similar issue with "unapprove" but can't replicate it so might have been something I had changed. Will hopefully put the 'Approved Revs'/'Cargo' combination through the paces and make it easier for others to adopt! --Greg Hitchon (talk) 20:59, 19 November 2019 (UTC)

Display format pie chart[edit source]

Hi all, I am trying to build some pie charts with the display format=pie chart. The documentation, however, only mentions very few parameters. To allow customization, are there any more? I would be really happy to see parameters like these ones for jpplotseries in SMW, specifically I would like to customize chart clour and display of data labels (can't enter an external link, but for jgplotseries, they were called "datalabels" and "colorscheme"). Any hope?

Let me first note that I think pie charts are a bad idea, and you should use bar charts instead! I know Cargo has a "pie chart" format, but that's only because I was pressured into adding it. Of course, this same request could apply to bar charts too. I just wanted to note that. Anyway, I think you can set the data labels by just setting aliases for each field name - do you want something more than that? Setting colors (or a color scheme) is a reasonable idea, though. Which would you prefer - manual control of colors, or a color scheme? Yaron Koren (talk) 16:10, 20 November 2019 (UTC)
Hi, sure, bar charts would work too in one of my cases, though in the second case the focus is really on shares, so pie charts seem to be the better choice in principle. I see that the bar chart format automatically shows the data value next to the bar. I meant the display of data values (absolute, or perent), not data labels, sorry about that. But when choosing bar chart, I get a legend that says "Blank value 1". So, what I need would be (in that order of priority):
  1. Customized colour (better than color scheme) per value
  2. Suppress the legend
  3. Pie chart: Choose to display absolute or percentage data for each value

If you could get around to that, that would be wonderful. If you don't want to invest in the pie chart, I can make do with the bar chart only. Thank you so much! --MelanieUh (talk) 10:23, 21 November 2019 (UTC)

Hi Yaron, any chance you have this issue on your todo list yet and an idea when you might be able to do it? Much appreciated!

--MelanieUh (talk) 05:25, 3 December 2019 (UTC)

Sorry for the delay. I understand the first two requests - which could probably be handled with parameters like "colors=" and "show legend=". But can you explain the last one, with absolute or percentage data? Yaron Koren (talk) 18:18, 4 December 2019 (UTC)
Okay, having looked into it more, I now understand the last one as well - the NVD3 library lets you display the number or percentage of each "slice", instead of its name. My new question is: can there be any kind of intelligent defaults made for all these settings? Not for colors, but for whether to show the legend, and which text to show for the labels (or whether to show the labels at all). Like, for instance - if the legend is shown, the labels should be numbers, but if there's no legend, they should always be the names? And is there any logic for whether it should be numbers or percentages? It seems to me like absolute numbers are always better to show, since the pie chart itself is supposed to give a sense for the percentage. Yaron Koren (talk) 04:09, 5 December 2019 (UTC)
Hey! My suggestions for default would be: Legend=on, Labels= numbers (and can be changed to percent, name or turned off completely). If no legend, Labels can indeed be name AND number. Would it be complicated to have two things in the Labels parameter, separated by comma? In Excel I often see that the series name AND the number is shown, or number AND percentage, separated by comma. --MelanieUh (talk) 07:45, 5 December 2019 (UTC)
Okay, thanks. The defaults of "legends on, labels = numbers; legends turned off, labels = text" make sense, I think. Yes, you could have both text and numbers for the labels, but (a) it might get too crowded in the chart (unless it's huge), and (b) I generally trust the makers of the NVD3 library, who have dealt with a lot more use cases - they only set the options of name, number and percentage. Though come to think of it, the size set for the pie chart could affect the defaults as well... (My goal with Cargo is to make the defaults as "smart" as possible.) Yaron Koren (talk) 13:55, 5 December 2019 (UTC)
I just released version 2.4 of Cargo - the pie chart format now has "colors" and "hide legend" parameters, and there are some other improvements to the format. I hope these additions work out for you... Yaron Koren (talk) 19:07, 9 January 2020 (UTC)

Cause of multiple rows for the same item[edit source]

I've had problems occasionally where a row is repeated in a Cargo (2.3.1 - 58278ee) table and I think I found the cause. I have a Text field with:

input type=textarea|autogrow|editor=visualeditor.

If the user includes an asterisk (*) at the beginning of a line, it seems to cause the issue, removing the asterisk (or moving from it the beginning of the line) seems to solve the problem. — Bryandamon (talk) 01:40, 27 November 2019 (UTC)

That’s really interesting, as I still get duplicate rows, despite the problem having been solved for most people, and I fairly often have an asterisk (for a bullet point) at the start of a line in a textarea... Jonathan3 (talk) 10:58, 27 November 2019 (UTC)
Interesting indeed. Out of curiosity, are you adding those asterisk bullet points with mobile? Have you tried removing (or moving) the asterisk and if so, did it fix your duplicate row? — Bryandamon (talk) 23:22, 27 November 2019 (UTC)
I nearly always edit using the PC. I have never been able to sit down and work anything out for sure. I have a MySQL query which identifies duplicate rows (incidentally, this might be something useful to have on the Cargo web interface) and when I notice duplicate rows I just run the command line script to recreate the database. Jonathan3 (talk) 23:55, 27 November 2019 (UTC)
I'm talking with Bryan about this right now, here's a reproduction of the duplicate rows issue - link - except using a nowiki tag instead of a bullet point. My immediate guess is this is a VE issue where bullet points are parsed later than the Cargo store that's causing the problem. I'm not sure how to fix that part, it sounds potentially really complicated to deal with, but there's definitely an issue where problematic tags cause duplicate rows at least. --RheingoldRiver (talk) 18:32, 4 December 2019 (UTC)
This patch would probably help in some of these cases - I really should just check it in. I don't know if it would help with that starting asterisk case, though... I don't know what would be causing that. Yaron Koren (talk) 23:25, 4 December 2019 (UTC)

Error with Cargo table[edit source]

Recently I openedCargo extension on rs.miraheze.org. This is the first time I have used Cargo. However, after editing a template decaling Cargo, error happens. See Special:CargoTables and Template:关卡信息. And Special:CargoTables is currently empty. --SolidBlock (talk) 11:50, 5 December 2019 (UTC)

I don't know what's causing that error. You would need to get an admin to add "$wgShowExceptionDetails = true;" to the settings for your wiki to see the actual error message, if that's possible. Yaron Koren (talk) 13:47, 5 December 2019 (UTC)
I've informed miraheze stewards to do that. Thanks. --SolidBlock (talk) 02:47, 15 December 2019 (UTC)

populating _pageData[edit source]

I have been finding data, especially about categories, not being in table _pageData where I would expect it. I realize that relying less on categories and more on cargo data would work around this, but I have some older wikis that used to rely more on SMW which worked ok with categories, so changing this retrospectively would take a fair bit of content change.

Something similar was mentioned here before, with a suggested solution, but no sign if something was done with it.

I would be prepared to add a php setCargoPageData.php --replacement to our overnight maintenance script, but then the table still would need to be manually replaced in Special:CargoTables.

Should it be necessary to run setCargoPageData.php regularly, or is there another solution? An if setCargoPageData.php, then how do I automate the "replace"?

Actually, I just realized a second issue: I guess the --repacement should deal with the replace, but for me does not: the table gets generated, but I have to manually replace it.

Right, this issue hasn't been fixed yet - it would require making use of some other MediaWiki hook to update just the "categories" field. Until then, a cron job is probably the way to go, sadly. The "--replacement" option deliberately doesn't replace the table at thend, in case the admin wants to double-check the new data first, but it does make sense to add an option to actually do the replacement, like "--doReplace". Yaron Koren (talk) 16:41, 12 December 2019 (UTC)
I would have expected the "--replacement" switch to cause the replacement to happen. What happens if I run setCargoPageData.php without the switch, then, it doesn't seem to do the update either.

If replacement is currently not available via command line switch, then is there a way to run whatever the URL https://wiki.umintmed.ca/index.php?title=Special:SwitchCargoTable/_fileData does from the command line? I could easily schedule a second line, but if it actually has to go to the URL then I would also have to manage login sessions and that quickly gets out of my knowledge range... Tenbergen (talk) 18:21, 12 December 2019 (UTC)

Alright, you convinced me to actually fix this issue. I just checked in to Cargo a patch that uses hooks to save (hopefully) the correct category data after every page save. If you can, please try out the latest code and let me know if it works for you! Yaron Koren (talk) 18:05, 13 December 2019 (UTC)
Hi Yaron, just getting back to this after installing updates for various things. Running Cargo 2.3.1 (24b5ef0) 2019-12-19T02:14:06 I have edited a page that had two categories (both added by templates) and added two more, one that has a page and one that doesn't yet. Now when I go to Page Values, the new categories show up, but the old ones don't any more. https://wiki.umintmed.ca/index.php?title=Tina_Tenbergen&action=pagevalues Tenbergen (talk) 18:34, 26 December 2019 (UTC)
I can't replicate that issue (though I do sometimes see some other strange behavior). Does this happen for you consistently, if you try it again? Yaron Koren (talk) 19:15, 26 December 2019 (UTC)

"In the future" and "In the past" filters on Special:Drilldown[edit source]

I would find it useful to have these two options, for dates, in addition to the usual grouped date ranges. Would this be possible? Thanks. Jonathan3 (talk) 19:37, 11 December 2019 (UTC)

It's an interesting idea. I wonder if this kind of thing can be extended in some way... is there any other special behavior that would be useful for date fields where you want this kind of filtering? Email notification when they occur? Email reminders beforehand? Color-coding within tables, depending on whether they're in the past or future? Yaron Koren (talk) 16:48, 12 December 2019 (UTC)
Thanks. It would be useful for events, or anything with a closing date, for example, if the URL parameters could ensure only events "in the future" are displayed. It would be good if (by default, without that option used) the past and future events were displayed differently - this could be an option when the table is declared, though that might overcomplicate things. Jonathan3 (talk) 00:48, 14 December 2019 (UTC)

Error: operator for the virtual field must be "HOLDS" etc., if field name in WHERE clause and field value are the same or nearly the same[edit source]

Hi everybody,

I face the following problem.

In a field named "country" I store a list (,) of strings. The values to be stored are "country-1", "country-2", "country-3" and "abc" (abc is just for testing purpose)

Now, I want to query the database for all projects which hold for instance "country-1". The query looks like this:

{{#cargo_query:tables=webmo_project|fields=name_short|group by=_pageID|where=country HOLDS "country-1"}}   <!-- I also tried where=country__full HOLDS "country-1" -->

Regrettably, I get this error message, although I used "HOLDS" as indicated

Error: operator for the virtual field 'webmo_project.country' must be 'HOLDS', 'HOLDS NOT', 'HOLDS LIKE' or 'HOLDS NOT LIKE'.

But, if I query for the test value "abc" everything works fine and I get the expected results from the query

{{#cargo_query:tables=webmo_project|fields=name_short|where=country HOLDS "abc"}}

I guess, that using similar names for the field name "country" and the field values like "country-1" is maybe causing this problem. Could this be a bug?

--Shuitavsshente (talk) 09:54, 12 December 2019 (UTC)

Might the problem be the hyphens in the field names? Jonathan3 (talk) 13:04, 12 December 2019 (UTC)
No, there are no hyphens in the field names - just in the string being looked for. Yes, I think this is a bug, due to overly-aggressive validation in Cargo. I can think of two ways to get around it: one is to change the query to "country__full LIKE '%country-1%'" ("HOLDS" won't work there), and the other is to rename the "country" field - to "country_name" or anything else. Yaron Koren (talk) 14:14, 12 December 2019 (UTC)
Thank you very much for the quick response. Regrettably, switching to HOLDS LIKE '%country-1%' does not work either. But somehow a funny fact: I adjusted the where-clause to "country HOLDS LIKE '%ountry-1%' (that means, I have deleted the leading "c" of country) and afterwards it is working fine. Seems to affirm my initial assumption, that similar wording for field names and field values causes this problem. --Shuitavsshente (talk) 15:13, 12 December 2019 (UTC)
Oh, right, that would work too. I think you misunderstood my first suggestion, but as long as it works, there's no problem. Yaron Koren (talk) 16:28, 12 December 2019 (UTC)
"No, there are no hyphens in the field names - just in the string being looked for" - Sorry - I typed my message on the phone and hadn't read the question properly. Jonathan3 (talk) 00:43, 14 December 2019 (UTC)

Error handling stores duplicate rows[edit source]

I think there's some kind of error handling that leads to duplication of rows. Here is a reproduction of one where a floating point is tried to stored in an Integer field, and it ends up storing the row twice. I think there might be some handling where it thinks it wasn't stored in the first place and then goes back and re-stores it again later on (given the order that the stores are done in).

I'd propose a new default column added to every table that's called _warnings, or maybe 2, called _warningCodes (List of String indexed) and _warningText (not indexed) so you can query codes and also read full text. I'd definitely be interested in helping to write this though I think I'm pretty far away from understanding the overall extension to do it soon. But, in the meantime, if you know what might be going on to cause these duplicates, I think that would be nice to fix (maybe it could, for now, just enter the page into an error category kinda like Max Loops Exceeded that the Loops extension has, so that it's not failing silently, and also not duplicating rows). --RheingoldRiver (talk) 22:34, 12 December 2019 (UTC)

Oh, I also forgot to say, that reproduction did not work when I had only 2 columns in the table, see: here --RheingoldRiver (talk) 01:21, 13 December 2019 (UTC)

I'm guessing that's due to the fact that, e.g., a value of "1.97" is being compared to "2", the code sees they're not the same, and assumes that it should be a separate row. I'll have to add in a fix for this, along the lines of the "check the substring" patch that I also need to check in. :( I don't understand the "warning" stuff... is that related to this duplicates problem? And would you still want those added, if this specific issue is fixed? Yaron Koren (talk) 18:09, 13 December 2019 (UTC)
So the point of the warning stuff is like, atm the duplication only happens when there's an actual problem with the input, e.g. I was trying to save a floating point in an Integer column. Currently the only way to discover problems like this is to notice the duplication and fix it. So if the duplication is gone (which I really think it should be, even if it leaves us without any discovery method), then we'll just have inputs silently failing and possibly never getting noticed. In a lot of cases this won't matter, but I think there should at least be SOME ability to detect warnings. So I was thinking one of the better ways might be to add columns to each row for the purpose of discovering errors - if there's longer explanations those can be in a text-type field called _warningTexts, and then, in order to make queries on types of errors, there can be a List (,) of String (or Int) column called _warningCodes, or maybe _warnings and _warningCodes or something. Then stuff will work as best as it can without adding incorrect duplicate lines, and for the most part stuff will "just work", but then there's also a discovery method present for locating issues.
An (imo worse) alternative to putting warnings directly into the rows would be to have a cargo_store automatically populate categories on the page with the types of warnings - but I think this isn't as good an option because you don't get specificity to the individual line of where stuff went wrong. --RheingoldRiver (talk) 03:19, 14 December 2019 (UTC)
Well, this sounds like two separate, and basically unrelated, discussions. I'm pretty sure that the duplication problem can be fixed in one way or another. But the handling of un-allowed values is an interesting question. There already is handling of error values in the case of "enumeration" fields - fields that have a specified list of allowed values. For those, values that aren't in this list are simply ignored - not stored at all, while everything else gets stored normally. For these and other cases (like trying to store a float in an integer field, or storing a too-large text in a string field), there are a number of possibilities: (a) ignore the bad data (as is sometimes done now), (b) don't save that row at all, (c) try to "massage" the data as best as possible (like rounding numbers, and truncating strings), (d) display an error message on the screen, (e) add the page to one or more special categories, (f) add one or more fields to every Cargo table to store the precise error information. And in theory, I think any of (a), (b) and (c) can be done with any of (d), (e) and (f). What do you think? Yaron Koren (talk) 04:27, 16 December 2019 (UTC)
I think in general they are related just because it would be good to push both updates at the same time, or at least the error discovery one first, so that discovery isn't lost after duplication is fixed. But in practice for writing code they're probably unrelated, yeah. Those options sound like all of the options, yeah. I think if (c) is done, that should be settable via a preference, so there's an option to have an actual type check implemented, so (c) with a fallback to (a) in the case that either the data can't be fixed or that preference is set to "don't attempt to save anyway" (atm I think it's maybe sometimes doing this? Since the floating point saved as 0 when there were few columns in the table but then broke everything when there were more columns? Does that make sense for how the code is?) For reporting, maybe doing both (e) and (f) would work - people can choose to HIDDENCAT the reporting category(ies?) if they don't care to be notified of warnings, but it'll still be there to check for issues. I think a category is a lot better than a message to a page, since you can easily see all pages in the category, also that makes it easily accessible via api. --RheingoldRiver (talk) 07:50, 16 December 2019 (UTC)
What about (b) (don't save row), combined with (d) (error message on screen) and (e) (error categories)? I think we're better off knowing that what we have typed isn't being entered into the database exactly at the outset (so not (a) or (c)). Categorisation would be useful for dealing with errors later (e.g. time constraints, or maybe for the administrator to deal with errors that editors have ignored). This would probably make it unnecessary to have separate table rows for error messages (so not (f)). Jonathan3 (talk) 00:06, 17 December 2019 (UTC)
Thanks for the feedback. I actually thought of another option: (g) have a separate special Cargo table, called "_storageErrors" or something, which just held the errors for bad data. It could have fields like _pageID, _tableName, _rowID (only if the data is getting saved, though), _fieldName, _value, _errorID - something like that. Does that sound appealing? Yaron Koren (talk) 00:18, 17 December 2019 (UTC)
I'm pretty opposed to on-screen errors tbh, Loops extension does that and it's the worst, I ended up changing my i18n for it to print one word as LoopsErrorOnThisPage so I had that as a search string for discovery (can't do regex cos it's not insource!), which is a ridiculous workaround and imo shouldn't ever be an intended way to do things. Printing error messages is more likely to confuse users & make them scared to do edits in the case that they aren't able to do anything, and users who ARE able to do things can just check error population pages. A category is a much better option imo. Regarding a separate table, hm...that could also be possible to create it like _pageData, and then it's possible to do a display with all warnings across all tables instead of needing to enumerate for each table. The first reservation I have is that it might not be clear how to join the two together, since you need both a join with _rowID and also a where (to ignore identical _rowID but from other tables), but I think that could be provided in documentation. And the other concern is, would this write all of the lines all at once in the case there are errors/warnings? Or could this potentially double save time for a page? --RheingoldRiver (talk) 00:37, 17 December 2019 (UTC)

I thought more about this, based in part on all the feedback. Here are some of my thoughts:

  • The category system, (e), doesn't seem "granular" enough. If there are 100 fields on a page, and the page gets added to a category called "Pages with an invalid value for an integer field", how helpful is that to users?
  • The original idea of adding fields like "_warningCodes", which I listed as (f), similarly doesn't seem granular enough, surprisingly. Again, if there are dozens of fields in a table, and the _warningCodes lists some error codes for that row, there's no obvious way to know which fields/values those apply to.
  • I guess that leaves a dedicated DB table, (g), as the error storage solution. It's true that doing joins on it would be tricky, but it's doable - especially in PHP or Lua - and I don't see an alternative.
  • I still think having onscreen errors, (d), could make sense. I don't see error messages as necessarily confusing or scary, if they're clearly written and displayed in neutral colors. Something like "Field 'Number_of_employees' in table 'Company' must be an integer" is fairly clear, and not so different from the error/warning tags that appear at the top of some Wikipedia articles ("The neutrality of this article is disputed").
  • As for the saving of the data, I think I agree with RheingoldRiver's suggested approach of "(c) with a fallback to (a)". (b), or not saving the row at all, seems too extreme - especially if the value can't easily be changed. (Though I will note that (b) would make the row duplication problem easier.)

Any thoughts? Yaron Koren (talk) 17:39, 17 December 2019 (UTC)

I'd be happy with any of (a), (b) or (c), as long as the error message (d) says what has happened. I suppose an option to show no error messages could be added to the table declaration (or Cargo LocalSettings.php variable, or hidden using CSS). The separate error table sounds good. Jonathan3 (talk) 00:45, 18 December 2019 (UTC)
I figured the error codes text would say like, "Code [code] - Found value [val] type [type] in field [field], type [type] expected!" So even if you can't join to find the exact issue, the error text in the text-type field would explain the code to you. I think this is how it would have to work regardless of it being in the same table or a different table, since we cant enumerate all fields in all tables. I do agree a separate table of just errors sounds good. RE: category/onscreen, I agree if it's done this way, it absolutely must have a setting to disable, how about a declaration like FieldName=String (warning_text:disable,warning_category:disable,warning_table:enable), with ability to enable/disable each part of the error action? And maybe a localsetting to adjust the default for each part? --RheingoldRiver (talk) 16:00, 18 December 2019 (UTC)
Jonathan3 - great, I'm glad we more or less agree. RheingoldRiver - why would you want different handling for different fields? Yaron Koren (talk) 16:03, 18 December 2019 (UTC)
Maybe this happens already to an extent, but could some of the checking could be done by Page Forms (in addition to Cargo), e.g. when 1.5 is entered into to an integer field, or when a too-long string is entered? That way, the page wouldn't get saved until the error gets fixed. Jonathan3 (talk) 00:18, 19 December 2019 (UTC)
Page Forms already does a lot of validation, though it doesn't validate integer fields. For some reason that never came up before (well, a big part of the reason is that Semantic MediaWiki doesn't have an "Integer" type - only Cargo does). It definitely should do integer fields. I'm less sure about string lengths. Page Forms already has a "maxlength" parameter you can apply to text inputs and textareas, but it doesn't automatically apply that checking based on the Cargo field size. There might be cases where people only want to store the first X characters of a field, but want to allow any length for that field on the page. Though I can't think of any such case right now.
Anyway, whatever happens with Page Forms, Cargo still needs its own validation and error handling setup, because not all wikis use Page Forms, and even on the ones that do, users can always create or modify pages by hand, or with scripts, etc. Yaron Koren (talk) 02:15, 19 December 2019 (UTC)
I think it's pretty likely there will be some times that I don't care about errors - e.g. if there's an integer field and someone populates a template with 1.0 instead of 1, I just really wouldn't care. Or maybe I have a string-type field that's deliberately only storing first 100 characters for some where condition, and I actually only care about the first couple words in the sentence, it's easier to take advantage of Cargo's automatic "error correction" than to trim myself. Also, once conceivable error/warning that Cargo might want to throw is, "duplicate line attempted to be stored in this table on this page!" But in fact I'm doing this deliberately (for a complicated unicode reason....) so I'd want to be able to quiet those warnings too. Since warnings are going to be in their own table, it would be reasonable to want this table to always be empty, and having a permanent record of warnings I know I'll never care about seems not so great. Having the ability to set per type would also allow for another thing I've wanted (and kinda implemented for myself in a complicated Lua/JS way), which is to have description strings settable per field (see for example here). This could be stored in an internal table with columns FieldName, Table, IndexInTable, Description, ErrorHandling that gets queried once per save maybe?
All that said, just having a localsettings entry to set global preference for onpage display/category/entry in table would probably be totally fine, if per-field adds too much overhead. --RheingoldRiver (talk) 18:03, 21 December 2019 (UTC)

Alright. I'm still not sure that preferences of any kind would be that useful... but let's go through these examples:

  • I'm not sure a value of 1.0 in an Integer field needs to be flagged as an error at all, but that might be a matter of opinion.
  • If there are strings that are too long for their storage, you could easily get rid of error messages by, for example, calling the ParserFunctions #sub function within the relevant field(s) in #cargo_store. That might be what you meant by "trim myself", but it doesn't seem like a big deal...
  • I don't understand that "duplicate line" potential warning. Can you explain that in more detail?
  • Having a per-field description sounds useful (and your current workaround is pretty neat), but it doesn't seem related to this error-handling thing. Wouldn't this be best handled by some sub-parameter, like "description=", within #cargo_declare? Yaron Koren (talk) 01:17, 24 December 2019 (UTC)


Duplicate rows thing - I need an intermediate table to go between a record about something a player did, and the player's page, since it could be a redirect. But, if I use the built-in _pageData table, I get duplication that I don't want due to case/special character non-sensitivity in Cargo being different from MediaWiki's. Cargo's sensitivity is also different from Lua's, so I can't do preprocessing in Lua either to get only 1 row per unique-in-Cargo redirect. So, my solution is to store all rows on a page, with just 1 column: the name of a redirect to this page. So, Cargo will "incorrectly" store only one single row per unique-in-Cargo case/special character redirect that exists. Then I can use this as my intermediate table in my joins and everything is happy. But, this behavior by Cargo to only store 1 row despite the fact that I might be "attempting" to store several, could be seen as a bug that should be flagged of "row not storing when it should be." But of course for me these aren't errors, it's intentional. (You could argue that I shouldn't be doing this in the first place, and I'd agree lol, but I can't think of any other way to be able to perform this join without getting duplication when 2 redirects e.g. SOAZ and Soaz both exist).
Warnings settings in general - Maybe this is less of a problem for newer wikis, but I'm just really apprehensive of the idea of suddenly having hundreds of thousands of potential error texts displaying on the wiki. I guess the category/error table are probably fine to always do since the category can be removed with HIDDENCAT and error table can be ignored, but showing text on a page without a way to disable it, for things that aren't necessarily super huge emergencies to fix, sounds like a really bad idea to me - I'd much rather have the option to keep allowing stuff to fail silently so that the importance of bugs aren't incorrectly elevated due to reporting method. I think the more granularity the better for this, which is why I was thinking per field, but I think at minimum there needs to be an option to turn off an on-screen warning.
And re description - yeah, that's unrelated in concept to this, but it might be a similar to how per-field error reporting settings might be created, both user-facing and also in implementation, so it seemed worthwhile to bring up in case that makes a difference for how reasonable it is to do per-field settings. --RheingoldRiver (talk) 22:19, 24 December 2019 (UTC)
I have to confess, I read through that duplicates explanation about three times and I still don't understand it. :( But if the idea is that a page contains more than one identical template call - Cargo has no way to distinguish between that and just "regular" duplicate storage of the kind that already happens, so it just ignores all duplicates; there would be no warning message.
To be clear, I still don't think categories should be used at all for Cargo errors.
It's true that lots of error displays might be a problem, especially for users who can't do anything about it. One option is to have a "permission" for viewing Cargo error messages, so that, for example, only admins will see the error messages. That's more or less what the Approved Revs extension does. Yaron Koren (talk) 16:28, 25 December 2019 (UTC)
Maybe the module that does it explains? I do pick unique cases here, but to Cargo, special characters (e.g. accented I) are interchangeable with the non-special version, and that's not something I can really tell Lua to select. This table gets used in joins to associate all names of a player together, based on their targets. But, if I have 2 entries that Cargo treats as being identical in a join, then I'd get unwanted duplication in my result. I'm not sure if this is what you meant by identical template calls, these rows are not duplicate, but they are duplicate up to case/special character sensitivity, so Cargo only stores one line.
Categories are nice because you can look at a large report of them all at once to verify emptiness, whereas a similar setup for Cargo queries is harder to at-a-glance pull information from - though I suppose I could set up some system where these themselves store the number of results in each query and then that's loaded via JS onto the parent category page - but having a category that's supposed to be empty is still in general my preferred way of being notified about issues.
Yeah a permission sounds fine, giving it to no one is the same as disabling warnings. --RheingoldRiver (talk) 20:53, 25 December 2019 (UTC)
Well, I'm still not sure I understand... but again, I can't imagine that there would be a "duplicate row" error/warning, so maybe that settles the issue?
As for categories: if the errors are stored in a Cargo table, their contents would be viewable at a glance from both Special:CargoTable and Special:Drilldown. Wouldn't that be enough to get a quick sense of the contents? Yaron Koren (talk) 04:49, 26 December 2019 (UTC)
Yeah, if there's no "duplicate row" warning then this doesn't matter. TLDR I'm taking advantage of a maybe-issue in Cargo and I really hope you don't fix it ever LOL (which it sounds like you won't). Re categories, I can check all my error-reporting categories at the same time on that one page, though, including ones completely unrelated to Cargo, which is pretty convenient. --RheingoldRiver (talk) 15:45, 26 December 2019 (UTC)

"Use of Parser title should never be null was deprecated in MediaWiki 1.34."[edit source]

I set up MW1.34 this weekend, and am still running it in debug mode on one wiki for other reasons. Just now, when I clicked the refresh tab on a "page values" page of a page, it gave the following error (hiding identifiables from addresses):

Deprecated: Use of Parser title should never be null was deprecated in MediaWiki 1.34. [Called from CargoUtils::smartParse in /home/xxx.ca/extensions/Cargo/includes/CargoUtils.php at line 479] in /home/xxx.ca/includes/debug/MWDebug.php on line 333
Deprecated: Use of ParserOptions::setWrapOutputClass( false ) was deprecated in MediaWiki 1.31. [Called from CargoUtils::smartParse in /home/xxx.ca/extensions/Cargo/includes/CargoUtils.php at line 502] in /home/xxx.ca/includes/debug/MWDebug.php on line 333

Is this the right place to flag that? Tenbergen (talk) 02:34, 23 December 2019 (UTC)

This is indeed a valid place to report these issues. The second one is real, and you actually first reported it about six months ago. It still needs to be fixed, of course. The first one seems quite mysterious. What do you have on line 479 of CargoUtils.php? Yaron Koren (talk) 01:23, 24 December 2019 (UTC)
Line 479 of CargoUtils.pho is the one starting with "$title" in the following:
 		// Parse it as if it's wikitext. The exact call
		// depends on whether we're in a special page or not.
		global $wgRequest;
		if ( is_null( $parser ) ) {
			$parser = MediaWikiServices::getInstance()->getParser();
		}
		$title = $parser->getTitle();
		if ( is_null( $title ) ) {
			global $wgTitle;
			$title = $wgTitle;
		}
However, both errors are gone now, together with some unrelated problems. I wonder if something else is just masking them, there seems to be no reason why these errors should be gone now. Tenbergen (talk) 02:43, 24 December 2019 (UTC)

"undefinedundefined" after month in calendar format[edit source]

When I use the calendar format, the month doesn't display as "December 2019", but instead as "December 2019undefinedundefined". Interestingly, it does this for both the Cargo and the SMW calendar format. Anyone know why that would be? Tenbergen (talk) 05:14, 23 December 2019 (UTC)

I haven't seen that problem before. Do you see any JavaScript errors in the browser console on that page? Yaron Koren (talk) 01:26, 24 December 2019 (UTC)
No, there were no JavaScript errors in the console. But, the problem has gone away, and so have some other Javascript problems, so maybe that was the problem after all. Thanks for the pointer! Tenbergen (talk) 02:16, 24 December 2019 (UTC)

controlling where a calendar entry links to[edit source]

I had posted a question here link earlier this year about how to control where an entry/event on a calendar links to. I thought I got it working back then, but it turns out I didn't. When I thought it worked I asked if it was a matter of making the link value the first value returned by the query, but never heard back on that. Does anyone know how to do this? Tenbergen (talk) 05:14, 23 December 2019 (UTC)

any magic word on a page can be added as a field in _pageData: Please add REVISIONSIZE[edit source]

In an earlier discussion it was requested to add more magic words to the list for _pageData. I have a use for the REVISIONSIZE and the ones under the "Technical metadata of another page" including the PAGESIZE and PROTECTIONLEVEL. Thank you for your consideration. Tjoneslow (talk) 14:10, 24 December 2019 (UTC)

I hadn't heard of any of these before. Some kind of "page length" field does sound like a good idea. But PAGESIZE and PROTECTIONLEVEL are both parser functions that take in a parameter. How would that work? (Plus, REVISIONSIZE and PAGESIZE are for the same thing, as far as I can tell.) Yaron Koren (talk) 15:34, 24 December 2019 (UTC)
I've discovered that {PAGESIZE:{PAGENAME}|R} works with the current cargo version. Which gets me what I wanted (the size of the wiki text into the table). I think for the parser functions I wanted just the current page. Tjoneslow (talk) 03:18, 29 December 2019 (UTC)
PROTECTIONLEVEL doesn't take in a page name... it sounds like all you want is a "_pageLength" field (or maybe it would be called "_numBytes"). Or maybe you don't even need that any more? Yaron Koren (talk) 16:43, 29 December 2019 (UTC)
I have a workaround for the page size that works for me. But I still think the general idea of "_numBytes" field could be useful to others. It could probably go on the "Nice To Have" list. I can think of a use case for the PROTECTIONLEVEL, but I don't currently have a use for it. Tjoneslow (talk) 23:33, 29 December 2019 (UTC)
Alright, a _numBytes field sounds good (and that's probably the right name for it). That's in addition to the previous fields planned to be added to _pageData: _rootPageName, _basePageName and _subPageName. Yaron Koren (talk) 04:39, 30 December 2019 (UTC)

incomplete csv download[edit source]

I should have 260+ rows in my csv file downloaded but I am only getting 101 rows. It is a simple one template creating a table.

What do I need to look at to fix? Thanks, Margaret

Add a "limit" value to the query/URL, like "limit=500". Yaron Koren (talk) 18:51, 3 January 2020 (UTC)

Multiple-Condition joins?[edit source]

Is this intentionally disallowed in Cargo? Could it be allowed? Or am I doing something wrong? join on=(SP.OverviewPage=TP._pageName AND SP.Link=TP.Link) (or any multiple-condition join I try) returns an error message "Table and field name must both be specified in 'TP._pageName AND SP.Link'." (Also, if you think this would be easy to do, maybe I can try implementing myself) --RheingoldRiver (talk) 19:47, 4 January 2020 (UTC)

You just need to separate the two parts with "," instead of " AND ". (And take out the parentheses.) This is shown in a few of the examples for #cargo_query in the documentation, but I realized it's never actually spelled out. I just added a little bit about it to the documentation. Yaron Koren (talk) 03:24, 6 January 2020 (UTC)
Oh, that makes sense, I don't know why I didn't try that tbh, I just assumed each item in the comma-separated list had to belong to a different set of tables. Thanks! --RheingoldRiver (talk) 07:05, 6 January 2020 (UTC)
Oh, I didn't realize that these two were for the same pair of tables. It's good to know that that works. Yaron Koren (talk) 15:55, 6 January 2020 (UTC)
Oh LOL - I'm not actually 100% sure it's working properly when there's 3+ tables total involved, I was getting results other than what I expected when I did this. I ended up getting something to work but it wasn't what I initially expected to do.
{{#cargo_query:tables=ScoreboardPlayer=SP,TournamentPlayers=TP,PlayerRedirects=PR1,PlayerRedirects=PR2,TournamentPlayers=TP2
|join on=
 SP.OverviewPage=TP._pageName,
 SP.Link=TP.Link,
 SP.Link=PR1.AllName,
 PR1._pageName=PR2._pageName,
 PR2.AllName=TP2.Link
|fields=SP.Link, CONCAT('[[',SP.OverviewPage,']]')=Page
|where=TP._pageName IS NULL
|having=MAX(TP2.Link) IS NULL
|group by=SP.Link,SP.OverviewPage
}}
This was my final query, I wanted to check when a player was missing from a manual list of everyone who participated in a tournament, but the two names used could be different and I didn't want false positives. When I get a chance I'll try and recreate the version that didn't work, though this query was complicated enough that I'm unconvinced it's Cargo's fault in this case and not mine. --RheingoldRiver (talk) 06:37, 7 January 2020 (UTC)
Actually looking at this again I'm unconvinced this is what I want lol, maybe I need to play with this more still.... --RheingoldRiver (talk) 06:42, 7 January 2020 (UTC)
Ah ok I just needed one additional join condition of TP._pageName=TP2._pageName. It looks like the first copy of TournamentPlayers isn't doing anything, but actually that restricts the possible fields enough that the HAVING on a join this size doesn't time out. So, it seems actually this is fine, I probably do need this much complexity in the query. I have some cases where I created fields that were concatenations of other fields just for the sake of a join in the past, so I'll try changing those to use this method instead so I can delete the concatenation fields, I'll update if anything doesn't work. --RheingoldRiver (talk) 06:49, 7 January 2020 (UTC)
Alright. For performance reasons, it may make sense to replace some of these joins with "where" additions. For instance, that last join you mentioned may be better done by adding " AND TP2._pageName IS NULL" (if I understand the query correctly). And maybe that "having" clause can also be replaced by "TP2.Link IS NULL" in "where"? I may be simply misunderstanding the query, though. Yaron Koren (talk) 14:27, 7 January 2020 (UTC)
So I need to find an entry of SP.Link where the corresponding TP._pageName is null. That's simple enough, just put those 2 conditions in the join and set TP_pageName is null in the where. The problem is, this falsely includes players where the names don't match. So I need SP.Link -> PR1 -> PR2 -> TP.Link to match up, and also for SP._pageName = TP._pageName in this case (otherwise it could be finding a place where a player is in the participants list of one event but not another). I need the HAVING because PR._pageName -> PR.AllName is one-to-many, so if I just say WHERE TP2.Link IS NULL, it'll say "oh it's satisfied for one of the PR.AllName so let's include this" and I get a bunch of false positives. So the where TP1 is null narrows down my possible solution space, and then the having all TP2.Link be null removes all false positives from that solution space. I can't change TP._pageName=TP2._pageName to be a WHERE condition because somewhat counterintuitively, I DON'T want that to be true - I want TP2._pageName to be NULL. (TP._pageName = TP2._pageName OR TP2._pageName IS NULL in the where condition also doesn't work, for the same reason that the having is necessary - I get a bunch of false positives in which this is true for one but not all of the names available from PR. So the final query was:
{{#cargo_query:tables=ScoreboardPlayer=SP,TournamentPlayers=TP,PlayerRedirects=PR1,PlayerRedirects=PR2,TournamentPlayers=TP2
|join on=
 SP.OverviewPage=TP._pageName,
 SP.Link=TP.Link,
 SP.Link=PR1.AllName,
 PR1._pageName=PR2._pageName,
 PR2.AllName=TP2.Link,
 TP._pageName=TP2._pageName
|fields=SP.Link, CONCAT('[[',SP.OverviewPage,']]')=Page
|where=TP._pageName IS NULL
|having=MAX(TP2.Link) IS NULL
|group by=SP.Link,SP.OverviewPage
}}
and I'm pretty sure there's no way to simplify without adding in a bunch of false positives due to the one-to-many relationship. It occurs to me though that I might be better of making a new version of PlayerRedirects that has all ordered pairs of names in it, so that I can simplify queries somewhat by having only one single copy of that table in the join...maybe I'll set it up that way sometime. --RheingoldRiver (talk) 01:17, 8 January 2020 (UTC)

Multiple-Condition joins (another case)[edit source]

{{#cargo_query: tables = a, b
| join on = a.p=b.p, a.q=b.q
| fields = a.p=ap, b.p=bp, a.q=aq, b.q=bq, a.x
| where = a.x="xxx" OR ( b.x="xxx" and b.y=1)
}}

I get a different number of rows depending on the order of conditions in JOIN. Only the last condition is used. How can this be explained? What am I doing wrong? --StasR (talk) 09:22, 9 January 2020 (UTC)

I'm not surprised that the order matters - small changes can make a big difference, as also seen in the section above. Are you sure that only the last "join on" condition is used? Yaron Koren (talk) 17:22, 9 January 2020 (UTC)
The order here can affect the performance, but not the result. It seems to me that the above request shall be equivalent to the SQL-query:
SELECT a.p AS ap, b.p AS bp, a.q AS aq, b.q AS bq, a.x 
  FROM a JOIN b 
  ON a.p=b.p AND a.q=b.q 
  WHERE a.x="xxx" OR ( b.x="xxx" AND b.y=1)
SQL result does not depend on the order. And it returns far fewer strings, because both ON-conditions apply. --StasR (talk) 18:48, 9 January 2020 (UTC)
Are you sure that only the last "join on" condition is used? — Yes, I sure. I made a simple illustrative example. --StasR (talk) 14:34, 10 January 2020 (UTC)
Okay, thanks for putting that together. I think the issue is that Cargo uses "LEFT OUTER JOIN" for joins, which is why you're seeing that changing behavior. Actually, just switching the left and right sides of each join will probably lead to different results too. I don't remember now why I chose "LEFT OUTER JOIN". It's certainly worth discussing - maybe a simple "JOIN" would be better. Or maybe the syntax should allow people to set the join type. (If you want to try a different kind of join, by the way, the relevant code is in /includes/CargoSQLQuery.php - there are four instances of "LEFT OUTER JOIN" that would all need to be changed.) Yaron Koren (talk) 15:17, 10 January 2020 (UTC)
"LEFT OUTER JOIN" and even exchanging the main and joined table does not change the SQL result (I checked). I think Cargo overwrites previous element instead of adding a new element on the parsing JOIN conditions. --StasR (talk) 16:35, 10 January 2020 (UTC)
Yes, indeed! You're right. I think the previous section made me a little overly-optimistic about multiple joins on the same tables. I just checked in what I think is a fix for this. Maybe no change to "LEFT OUTER JOIN" is necessary. Yaron Koren (talk) 20:02, 10 January 2020 (UTC)
After this fix is the syntax the same or will you put multiple-condition joins in one line together with AND ? --RheingoldRiver (talk) 18:26, 12 January 2020 (UTC)
No, the syntax is the same; what happened before is that earlier join conditions were just getting overwritten. Yaron Koren (talk) 00:49, 13 January 2020 (UTC)

Storing correct redirects in _pageData (again)[edit source]

Would a solution for this be to store _pageData once at the normal time, so that it can be used in processing the rest of the page, and then go back and re-store it again at the very end of page save? This would require an extra blank edit for a page's categories to apply to itself of course, but it would still be a huge improvement over current. If this seems reasonable to you I'll try and implement myself. --RheingoldRiver (talk) 03:14, 10 January 2020 (UTC)

What's the problem that requires a solution? Yaron Koren (talk) 04:13, 10 January 2020 (UTC)
Oh I wrote redirects, I meant categories, sorry. When wrong categories are stored the first time they get added on pages. --RheingoldRiver (talk) 05:27, 10 January 2020 (UTC)
Oh, okay. Well, you might be in luck! The latest version of Cargo, 2.4 (just released yesterday) contains an attempted fix for the category storage problem, added in December. If/when you get the latest version, hopefully the category setting will work better for you... Yaron Koren (talk) 14:57, 10 January 2020 (UTC)
Oh yay, thanks! I'll open a ticket for us to update. --RheingoldRiver (talk) 15:10, 10 January 2020 (UTC)

Incorrect Drilldown[edit source]

Cargo v. 2.3.1. The table contains 4125 rows. As can be seen on this screenshot, for all the fields, where is the complete list of values, the amount is much smaller. (I beg your pardon, porno is just an abbreviation for sequence number in Russian :-) --StasR (talk) 18:00, 12 January 2020 (UTC)

Yaron, do I understand correctly that the sum of value counters for each field must be equal to the number of rows of the table? --StasR (talk) 10:06, 14 January 2020 (UTC)

I'm guessing that the issue is the overlap. So, for example, the first "bucket" is 0-60, while the second is 60-200, so a page that had a value of 60 for that field would get listed in both. I'm not sure what's the best handling for that kind of thing. The problem with making the second bucket 61-200 instead is that there might be non-integer values - so a page that had a value of 60.5 would not show up anywhere. You could have something like ">60-200", but that might look like awkward. Or maybe the current approach is fine? This is all based on the assumption that that is indeed the issue - maybe it's something else. (Another option is that there are pages that have more than one value for this field, in different buckets.) Yaron Koren (talk) 14:34, 14 January 2020 (UTC)
The field 'locno' is integer from 1 to 4 (no NULL). And sum (23+10+20+1=54) considerably less than the number of rows (4125). Sums in all fields with a complete enumeration are very small. --StasR (talk) 16:50, 14 January 2020 (UTC)
Oh, the number within the buckets is smaller than it should be. For that "locno" field - which of these values should have a bigger amount? Or is it all of them? Yaron Koren (talk) 17:11, 14 January 2020 (UTC)
Statistics for "locno" and "event" (collapsed):
 locno 	COUNT( locno ) 	
1 	3472
2 	619
3 	33
4 	1

 event 	COUNT( event ) 	
1 	36
2 	104
3 	1057
6 	207
7 	268
8 	367
13 	466
14 	4
15 	42
16 	418
17 	2
18 	396
19 	149
20 	1
21 	377
22 	174
23 	22
24 	2
25 	30
26 	2
27 	1

--StasR (talk) 19:20, 14 January 2020 (UTC)

That's very strange... are there a lot of rows per page, or just one? Yaron Koren (talk) 19:59, 14 January 2020 (UTC)
That's very-very strange. Pages generate a large number of rows (but I do so in many projects). Writing data takes place fully and rewriting does not change these amounts. In what module generated the Drilldown results? I'll try to trace and debug it. --StasR (talk) 09:22, 15 January 2020 (UTC)
Okay, now it's less surprising. Special:Drilldown, for better or worse, breaks down the data by pages and not rows. So if there are 10 rows in a page all matching a filter value, that will show up as one result, not 10, in both the filter listings and the actual results. (At least, that's how it's supposed to work.) Does the data make more sense now? Or are the numbers still off? Yaron Koren (talk) 14:15, 15 January 2020 (UTC)
Then it explains everything, thank you! --StasR (talk) 18:15, 15 January 2020 (UTC)
Great. By the way, if you can think of some way to make this clearer in the interface - or, for that matter, if you think the results should change to refer to rows and not pages - let me know. Yaron Koren (talk) 14:00, 16 January 2020 (UTC)