Extension talk:Cargo

incomplete csv download
I should have 260+ rows in my csv file downloaded but I am only getting 101 rows. It is a simple one template creating a table.

What do I need to look at to fix? Thanks, Margaret


 * Add a "limit" value to the query/URL, like "limit=500". Yaron Koren (talk) 18:51, 3 January 2020 (UTC)

Multiple-Condition joins?
Is this intentionally disallowed in Cargo? Could it be allowed? Or am I doing something wrong? (or any multiple-condition join I try) returns an error message "Table and field name must both be specified in 'TP._pageName AND SP.Link'." (Also, if you think this would be easy to do, maybe I can try implementing myself) --RheingoldRiver (talk) 19:47, 4 January 2020 (UTC)


 * You just need to separate the two parts with "," instead of " AND ". (And take out the parentheses.) This is shown in a few of the examples for #cargo_query in the documentation, but I realized it's never actually spelled out. I just added a little bit about it to the documentation. Yaron Koren (talk) 03:24, 6 January 2020 (UTC)


 * Oh, that makes sense, I don't know why I didn't try that tbh, I just assumed each item in the comma-separated list had to belong to a different set of tables. Thanks! --RheingoldRiver (talk) 07:05, 6 January 2020 (UTC)


 * Oh, I didn't realize that these two were for the same pair of tables. It's good to know that that works. Yaron Koren (talk) 15:55, 6 January 2020 (UTC)


 * Oh LOL - I'm not actually 100% sure it's working properly when there's 3+ tables total involved, I was getting results other than what I expected when I did this. I ended up getting something to work but it wasn't what I initially expected to do.


 * This was my final query, I wanted to check when a player was missing from a manual list of everyone who participated in a tournament, but the two names used could be different and I didn't want false positives. When I get a chance I'll try and recreate the version that didn't work, though this query was complicated enough that I'm unconvinced it's Cargo's fault in this case and not mine. --RheingoldRiver (talk) 06:37, 7 January 2020 (UTC)


 * Actually looking at this again I'm unconvinced this is what I want lol, maybe I need to play with this more still.... --RheingoldRiver (talk) 06:42, 7 January 2020 (UTC)


 * Ah ok I just needed one additional join condition of . It looks like the first copy of TournamentPlayers isn't doing anything, but actually that restricts the possible fields enough that the HAVING on a join this size doesn't time out. So, it seems actually this is fine, I probably do need this much complexity in the query. I have some cases where I created fields that were concatenations of other fields just for the sake of a join in the past, so I'll try changing those to use this method instead so I can delete the concatenation fields, I'll update if anything doesn't work. --RheingoldRiver (talk) 06:49, 7 January 2020 (UTC)


 * Alright. For performance reasons, it may make sense to replace some of these joins with "where" additions. For instance, that last join you mentioned may be better done by adding " AND TP2._pageName IS NULL" (if I understand the query correctly). And maybe that "having" clause can also be replaced by "TP2.Link IS NULL" in "where"? I may be simply misunderstanding the query, though. Yaron Koren (talk) 14:27, 7 January 2020 (UTC)


 * So I need to find an entry of SP.Link where the corresponding TP._pageName is null. That's simple enough, just put those 2 conditions in the join and set TP_pageName is null in the where. The problem is, this falsely includes players where the names don't match. So I need SP.Link -> PR1 -> PR2 -> TP.Link to match up, and also for SP._pageName = TP._pageName in this case (otherwise it could be finding a place where a player is in the participants list of one event but not another). I need the HAVING because PR._pageName -> PR.AllName is one-to-many, so if I just say WHERE TP2.Link IS NULL, it'll say "oh it's satisfied for one of the PR.AllName so let's include this" and I get a bunch of false positives. So the where TP1 is null narrows down my possible solution space, and then the having all TP2.Link be null removes all false positives from that solution space. I can't change  to be a WHERE condition because somewhat counterintuitively, I DON'T want that to be true - I want TP2._pageName to be NULL.   in the where condition also doesn't work, for the same reason that the having is necessary - I get a bunch of false positives in which this is true for one but not all of the names available from PR. So the final query was:


 * and I'm pretty sure there's no way to simplify without adding in a bunch of false positives due to the one-to-many relationship. It occurs to me though that I might be better of making a new version of PlayerRedirects that has all ordered pairs of names in it, so that I can simplify queries somewhat by having only one single copy of that table in the join...maybe I'll set it up that way sometime. --RheingoldRiver (talk) 01:17, 8 January 2020 (UTC)

Multiple-Condition joins (another case)
I get a different number of rows depending on the order of conditions in JOIN. Only the last condition is used. How can this be explained? What am I doing wrong? --StasR (talk) 09:22, 9 January 2020 (UTC)


 * I'm not surprised that the order matters - small changes can make a big difference, as also seen in the section above. Are you sure that only the last "join on" condition is used? Yaron Koren (talk) 17:22, 9 January 2020 (UTC)


 * The order here can affect the performance, but not the result. It seems to me that the above request shall be equivalent to the SQL-query:

SELECT a.p AS ap, b.p AS bp, a.q AS aq, b.q AS bq, a.x  FROM a JOIN b   ON a.p=b.p AND a.q=b.q   WHERE a.x="xxx" OR ( b.x="xxx" AND b.y=1)
 * SQL result does not depend on the order. And it returns far fewer strings, because both ON-conditions apply. --StasR (talk) 18:48, 9 January 2020 (UTC)


 * Are you sure that only the last "join on" condition is used? — Yes, I sure. I made a simple illustrative example. --StasR (talk) 14:34, 10 January 2020 (UTC)


 * Okay, thanks for putting that together. I think the issue is that Cargo uses "LEFT OUTER JOIN" for joins, which is why you're seeing that changing behavior. Actually, just switching the left and right sides of each join will probably lead to different results too. I don't remember now why I chose "LEFT OUTER JOIN". It's certainly worth discussing - maybe a simple "JOIN" would be better. Or maybe the syntax should allow people to set the join type. (If you want to try a different kind of join, by the way, the relevant code is in /includes/CargoSQLQuery.php - there are four instances of "LEFT OUTER JOIN" that would all need to be changed.) Yaron Koren (talk) 15:17, 10 January 2020 (UTC)


 * "LEFT OUTER JOIN" and even exchanging the main and joined table does not change the SQL result (I checked). I think Cargo overwrites previous element instead of adding a new element on the parsing JOIN conditions. --StasR (talk) 16:35, 10 January 2020 (UTC)


 * Yes, indeed! You're right. I think the previous section made me a little overly-optimistic about multiple joins on the same tables. I just checked in what I think is a fix for this. Maybe no change to "LEFT OUTER JOIN" is necessary. Yaron Koren (talk) 20:02, 10 January 2020 (UTC)


 * After this fix is the syntax the same or will you put multiple-condition joins in one line together with AND ? --RheingoldRiver (talk) 18:26, 12 January 2020 (UTC)


 * No, the syntax is the same; what happened before is that earlier join conditions were just getting overwritten. Yaron Koren (talk) 00:49, 13 January 2020 (UTC)

Storing correct redirects in _pageData (again)
Would a solution for this be to store _pageData once at the normal time, so that it can be used in processing the rest of the page, and then go back and re-store it again at the very end of page save? This would require an extra blank edit for a page's categories to apply to itself of course, but it would still be a huge improvement over current. If this seems reasonable to you I'll try and implement myself. --RheingoldRiver (talk) 03:14, 10 January 2020 (UTC)


 * What's the problem that requires a solution? Yaron Koren (talk) 04:13, 10 January 2020 (UTC)


 * Oh I wrote redirects, I meant categories, sorry. When wrong categories are stored the first time they get added on pages. --RheingoldRiver (talk) 05:27, 10 January 2020 (UTC)


 * Oh, okay. Well, you might be in luck! The latest version of Cargo, 2.4 (just released yesterday) contains an attempted fix for the category storage problem, added in December. If/when you get the latest version, hopefully the category setting will work better for you... Yaron Koren (talk) 14:57, 10 January 2020 (UTC)


 * Oh yay, thanks! I'll open a ticket for us to update. --RheingoldRiver (talk) 15:10, 10 January 2020 (UTC)

Incorrect Drilldown
Cargo v. 2.3.1. The table contains 4125 rows. As can be seen on this screenshot, for all the fields, where is the complete list of values, the amount is much smaller. (I beg your pardon, porno is just an abbreviation for sequence number in Russian :-) --StasR (talk) 18:00, 12 January 2020 (UTC)

Yaron, do I understand correctly that the sum of value counters for each field must be equal to the number of rows of the table? --StasR (talk) 10:06, 14 January 2020 (UTC)


 * I'm guessing that the issue is the overlap. So, for example, the first "bucket" is 0-60, while the second is 60-200, so a page that had a value of 60 for that field would get listed in both. I'm not sure what's the best handling for that kind of thing. The problem with making the second bucket 61-200 instead is that there might be non-integer values - so a page that had a value of 60.5 would not show up anywhere. You could have something like ">60-200", but that might look like awkward. Or maybe the current approach is fine? This is all based on the assumption that that is indeed the issue - maybe it's something else. (Another option is that there are pages that have more than one value for this field, in different buckets.) Yaron Koren (talk) 14:34, 14 January 2020 (UTC)


 * The field 'locno' is integer from 1 to 4 (no NULL). And sum (23+10+20+1=54) considerably less than the number of rows (4125). Sums in all fields with a complete enumeration are very small. --StasR (talk) 16:50, 14 January 2020 (UTC)


 * Oh, the number within the buckets is smaller than it should be. For that "locno" field - which of these values should have a bigger amount? Or is it all of them? Yaron Koren (talk) 17:11, 14 January 2020 (UTC)


 * Statistics for "locno" and "event" (collapsed):

locno 	COUNT( locno ) 1 	3472 2 	619 3 	33 4 	1

event 	COUNT( event ) 1 	36 2 	104 3 	1057 6 	207 7 	268 8 	367 13 	466 14 	4 15 	42 16 	418 17 	2 18 	396 19 	149 20 	1 21 	377 22 	174 23 	22 24 	2 25 	30 26 	2 27 	1 --StasR (talk) 19:20, 14 January 2020 (UTC)


 * That's very strange... are there a lot of rows per page, or just one? Yaron Koren (talk) 19:59, 14 January 2020 (UTC)


 * That's very-very strange. Pages generate a large number of rows (but I do so in many projects). Writing data takes place fully and rewriting does not change these amounts. In what module generated the Drilldown results? I'll try to trace and debug it. --StasR (talk) 09:22, 15 January 2020 (UTC)


 * Okay, now it's less surprising. Special:Drilldown, for better or worse, breaks down the data by pages and not rows. So if there are 10 rows in a page all matching a filter value, that will show up as one result, not 10, in both the filter listings and the actual results. (At least, that's how it's supposed to work.) Does the data make more sense now? Or are the numbers still off? Yaron Koren (talk) 14:15, 15 January 2020 (UTC)


 * Then it explains everything, thank you! --StasR (talk) 18:15, 15 January 2020 (UTC)


 * Great. By the way, if you can think of some way to make this clearer in the interface - or, for that matter, if you think the results should change to refer to rows and not pages - let me know. Yaron Koren (talk) 14:00, 16 January 2020 (UTC)

Drilldown
I have noticed that in Drilldown the filter options end up being what you put in the parameters of a template. But that's a problem. If I put yes in a parameter called Presidential, it will appear as "yes" instead of "Presidential". Is there a way to correct that? --Hispano76 (talk) 00:48, 20 January 2020 (UTC)
 * Do you have an example demonstrating this? Gryllida 01:11, 20 January 2020 (UTC)
 * Something like this would be the use of the template
 * On code:
 * On infobox in page:
 * --Hispano76 (talk) 01:19, 20 January 2020 (UTC)


 * Just to clarify: in Special:Drilldown, you want the filter called "Presidential" to also have the value "Presidential"? If so, then within the #cargo_store call you can have something like . Maybe I'm misunderstanding what you're asking for, though. Yaron Koren (talk) 15:42, 20 January 2020 (UTC)
 * Thanks It's partially working. Even if it says presidential= yes it will still appear in the office as Not Presidential But in the title of the filter and the options they look correct but badly listed. I would like a solution to use this in other parameters/filters/options. --Hispano76 (talk) 19:19, 20 January 2020 (UTC)
 * That was my mistake, it's working now. Thank you very much. --Hispano76 (talk) 00:12, 21 January 2020 (UTC)

Suggestion to stop aliasing "_pageName" as (nstab-main) at Special:CargoTables/in queries
In this view where it says Page, I think it should say  instead, overall it would be less confusing, I think "how to get page column" is the most frequently asked Cargo-specific question I get from people trying to use my API. I'm gonna update my docs to emphasize this, but figured I'd mention it here too --RheingoldRiver (talk) 20:40, 23 January 2020 (UTC)


 * I'm glad we're discussing this. In addition to _pageName, there are also the pre-defined tables, _pageData and _fileData, which of course are composed entirely of pre-defined columns. In the case of those columns, I actually want to add descriptive aliases - since a string like "_creationDate" does not mean anything to someone who doesn't speak English. But of course it's also useful to see the actual column name - something I hadn't though enough about before. Maybe the best solution is to show both, and have each such column header look something like "Page (_pageName)". What do you think?
 * By the way, as far as I know the "Page" alias only shows up in Special:CargoTables, not in regular queries. Is that not true? Yaron Koren (talk) 01:56, 24 January 2020 (UTC)


 * Ah, yeah you're right, only that one view. I did want to talk to you eventually about adding native support for the docstrings that atm I'm showing via Lua/ajax workaround on table pages (example) - maybe setting default docstrings for these fields along with calling the field name by the real name in the actual table would work? I'm not sure how Cargo would store these docstrings though, maybe one internal table that has table name, field name, and docstring could be used. --RheingoldRiver (talk) 21:13, 24 January 2020 (UTC)


 * We did actually already talk about the descriptions thing, here. I think it's an unrelated issue, though... unless you think descriptions should be added to the column headers as well (as a hover thing, maybe?). In any case, I still like my "Page (_pageName)" idea. Yaron Koren (talk) 00:59, 27 January 2020 (UTC)


 * Tbh I don't like that, I think it's obvious what it means only if you already know what it means, at which point you don't need the extra descriptor. Having a full docstring above the table where it says "table stucture" currently is 99% whitespace would allow for enough space to actually explain what the field names mean, I feel pretty strongly that's the correct way to do this and also should be extended to allow users to add their own docstrings to non-default fields. --RheingoldRiver (talk) 00:25, 30 January 2020 (UTC)


 * Really, you think table headers with columns like "_creationDate", "_modificationDate", etc. are good enough, even for people who can't speak English, as long as those terms are defined above the table in their language? What if those terms were in Chinese - would that setup still be good enough? Yaron Koren (talk) 01:11, 30 January 2020 (UTC)


 * Yeah, cos the parser function / what you query the fields isn't translated. Unless you want to build in support for queries to use the translated name, I don't think there's any need to show anything than the actual names of the fields as they are used when you write the queries, cos that information is there for developers. If the field names were in Chinese it could be assumed anyone looking at the table page knows enough Chinese to code in it (or at least is used to doing things like counting the number of rows down in the docs and columns across in the table) --RheingoldRiver (talk) 02:27, 30 January 2020 (UTC)


 * Hm, you might be right... there is a certain symmetry in having the "field description" part get used for both user-supplied descriptions (for regular fields) and built-in translations/descriptions (for built-in fields). Yaron Koren (talk) 04:12, 30 January 2020 (UTC)

Error in "where" parameter: the string "#" cannot be used within #cargo_query.
How come this is the case? I wanted to search for tournaments whose names have a  in them because they're probably causing issues with some links. --RheingoldRiver (talk) 18:36, 2 February 2020 (UTC)


 * That restriction is there to prevent people from doing something malicious, because "#" is used to specify a comment in MySQL. I don't know of a way to allow these that wouldn't potentially enable malicious behavior. Maybe there's a way, but until one is discovered, I'm afraid you'll have to... not search on that. Yaron Koren (talk) 03:33, 3 February 2020 (UTC)


 * Ah that makes sense, thanks. --RheingoldRiver (talk) 04:07, 3 February 2020 (UTC)

Cargo 2.4 failing to store categories in _pageData
On gbf.wiki we recently upgraded to Cargo 2.4 (2e93e5c) and a new issue has cropped up.

Cargo doesn't always save all categories on a page to _pageData.


 * Pages with missing categories will only add any new or renamed categories to _pageData when updated. Old missing categories remain missing.
 * Creating a new page with exactly the same categories is usually stored correctly in _pageData.
 * Null editing a page does not fix _pageData, and on pages without issues sometimes causes categories to disappear.
 * We do not know what causes the categories to disappear from _pageData in the first place. These pages worked fine before 2.4.

I created a new category "Cargo Issue" for the known affected pages, but it seems to be spreading. Any help on tracking down the issue and what extra info we could provide to help solve it would be greatly appreciated.


 * Sorry about that - the saving of categories did indeed change, in an effort to improve it - clearly it hasn't always led to improvements. For now, I think there are two ways to get a page's category data to be correct: remove all of the page's categories, save the page, then add them back in and save again; or just call the setCargoPageData.php script, which should result in correct category data for all the pages. Yaron Koren (talk) 14:14, 4 February 2020 (UTC)


 * If it's not working as-is I think saving _pageData twice in the course of a page save, once at the start, and then again at the very end would work without being too expensive an addition. --RheingoldRiver (talk) 19:03, 4 February 2020 (UTC)


 * The problem is, even saving it at the end of a page save (or waiting 10 seconds to save it, or some such) wouldn't necessarily work - the category data can take a long time to get saved. I don't know what the best approach is for saving category data. Yaron Koren (talk) 19:42, 4 February 2020 (UTC)


 * While running setCargoPageData.php we possibly found the cause of another error that has been confounding us. The error we get is:
 * It appears to me like perhaps the 299 byte long right side of the SUBSTR comparison is cropped using php substr which is not multi-byte safe. The full VA category is "Voice Actor: Sōma Saitō", note the missing ō at the end and how it seems to have swallowed the ' in the query.
 * This also affects the _fullText column if byte 1000 is part of a multi-byte character. Ypnocsd (talk) 21:30, 4 February 2020 (UTC)
 * This also affects the _fullText column if byte 1000 is part of a multi-byte character. Ypnocsd (talk) 21:30, 4 February 2020 (UTC)

Internal error on Recent Changes after running cargoRecreateData.php
On one of my wikis I get error "Internal Error" on the "Recent Changes" page after I run cargoRecreateData.php. I initially thought this was a MediaWiki core problem, see https://phabricator.wikimedia.org/T241302 for my bug report and additional info there. It turns out Recent Changes breaks because cargoRecreateData.php creates entries in table recentchanges where rc_title is blank (ie  is not empty). The records are: If I delete the changes, Recent Changes works again, but running the .php command again breaks recent changes again. I run that .php command on a crontab in several other wikis, and none of them have this issue. Tenbergen (talk) 17:45, 4 February 2020 (UTC)


 * It looks like there was some change in MW 1.34 so that now Cargo's logging gets added to the "recentchanges" table (it shouldn't). If you don't mind, could you try changing line 1160 of /includes/CargoUtils.php from:


 * ...to:


 * ...and see if that fixes the problem, if you delete those entries and run the script again? Yaron Koren (talk) 20:57, 4 February 2020 (UTC)

Custom warning on a table or field not existing for cargoquery api
This is maybe something to talk about at emwcon but I have a lot of people using my API at this point, and not a great way to communicate with everyone. When I make changes to my tables I'd really like to be able to set my own warnings for certain Cargo failures...for example if a table doesn't exist then have it try to print (api-cargoquery-tablenotexists-TableName) or if a field doesn't exist then (api-cargoquery-fieldnotexists-TableName-FieldName) so that I can write custom messages there (and if there's no such message then just print the default). How doable would it be to detect some specific error types and try and print a system message for those instead of the actual SQL warning? Is that a reasonable thing to do? What other warnings should maybe get custom messages? --RheingoldRiver (talk) 19:14, 4 February 2020 (UTC)


 * That's an interesting question. How unique would these error/warning messages be? If they would all essentially take the form of "Field X was renamed to Y", there may be an easier solution than creating custom messages for each one. Yaron Koren (talk) 19:45, 4 February 2020 (UTC)