Bug management/Triage/Databases 20111102

notes archived from Etherpad

Also see Database testing and New installer/Test plan

Wednesday, 2 November: Non-MySQL databases at 19:00UTC
Discuss, validate & prioritize MediaWiki bugs that affect Postgres, SQL Server, Oracle, SQLite, MariaDB, and other non-MySQL databases. Come out with a first plan on improving support.

1.5 hours long


 * 1) wikimedia-dev on FreeNode IRC

Attendees (potential):

 * DJ Bauch -- will attend. 1.16.5 is working fine for me on Windows 7. 1.16.5 times out a lot on Windows Azure. 1.19alpha (what about 1.18, need a tarball? any easy way for others to test? Use REL1_18 branch?) on Windows 7 is still having problems with UTF-8 characters in articles and templates and frequent failures to update objectcache table as of 27 OCT.
 * Ben Lobaugh
 * Greg Sabino Mullane
 * freakolowsky
 * Max Semenik
 * Colin Charles
 * Mark Hershberger
 * Brian Swan
 * Jonathan Guerin
 * Sumana Harihareswara
 * Niklas Laxström
 * Andy Lester

Bugs to validate

 * https://bugzilla.wikimedia.org/show_bug.cgi?id=30787 ** PostgreSQL 9.0 default 'mediawiki' schema causes failure, use 'public' instead.


 * I don't think there's a way to set a default schema, other than search_path.


 *  The user they use should already have the search_path changed.


 * G_SabinoMullane: I'd love to see whether anyone can reproduce this error; if not, I'm happy to close it.


 *  I'm going to guess that they are connecting with the wrong Postgres use in their LocalSettings.php


 *  The installer should be doing an ALTER USER to set the default search_path


 * and also note the "are you on the wrong user"


 *  But without knowing the exact steps they did, it's impossible to know if its a true bug or user error.


 * <^demon> "ALTER ROLE $safeuser SET search_path = $safeschema, public"


 * Conclusion: commented on bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=30787#c2


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=20475 ** SpecialExport producing corrupt output (PostgreSQL errors)


 *  I think OverlordQ summed it up well. Yet more fun with the loadbalancer


 * One of the unfortunate aspects of pg_query is that it doesn't require a first parameter of a connection if it's been called before. It lets you call pg_query( $conn, $sql ) the first time, and then pg_query( $sql ) thereafter. and I'm guessing that somewhere before 1.17.0 which I'm looking at now, there were pg_query calls that relied ont hat behavior incorrectly. Anyway, I'm looking in DatabasePostgres.php and I don't see anywhere it's calling pg_query without a connection object.


 *  It's pg_connect that's the issue, no?


 * But I don't have the 1.16-svn source in front of me.


 * to repro on MediaWiki 1.17.0:  it should be easy to test, make some jobs and make sure they are run after call to Special:Export


 *  alester: you should also have non-zero jobqueue


 *  I've exported pages just fine before (but never tried to specifically duplicate this bug)


 *  Nikerabbit, I have: Notice: unserialize [function.unserialize]: Error at offset 0 of 53 bytes in D:\Projects\MediaWiki\includes\objectcache\SqlBagOStuff.php on line 381


 * I just now ran an export on all the pages in on my company wiki and it ran fine w/o errors.


 *  I wouldn't be surprised if it is really fixed, it's been many changes since 2009


 * Conclusion: Sumana asked for repro and closed for no repro in https://bugzilla.wikimedia.org/show_bug.cgi?id=20475#c4


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=22579 PostgreSQL Schema option ignored


 * 22579 is the same as the first ticket we looked at


 *  Does not seem the same, but does claim to be fixed. Unfortunately nobody left a revision


 * ok, so I will tell the user to please check that s/he used the correct Postgres user


 * Conclusion: resolved WFM, asked user to try to repro with 1.17.0 https://bugzilla.wikimedia.org/show_bug.cgi?id=22579#c3


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=32118 test special pages SQL queries against all supported databases


 *  throw some test data at it, then call internally via the API?


 * without having the test know how to access the db itself you could do that. insert via api and retrieve via api, compare


 * sumanah: i can work with djbauch to make sure the MSSQL driver is fully tested


 * Yeah, I can work with Ben on all the required MS SQL tests.


 * sumanah: is there no specification for test for dbs right now?


 *  Nikerabbit: I don't think so.


 *  blobaugh, first, you have to get normal tests to work on your backend


 *  Special pages are working OK for me now on SQL Server. Even the GROUP BY stuff


 * Conclusion -- Ben & DJ to work on this - for MSSQL at least. if an abstract method can be ported to each we will try to find it


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=28172 wfGetDB called when it shouldn't be


 * <G_SabinoMullane> I dug into 28172 quite a bit, but finally gave up after swimming in the sea of LoadBalancer objects and DB calls.


 * <^demon> I've never managed to repro it after I fixed it for mysql/sqlite.


 * <G_SabinoMullane> I think Chad H or Tim need to take a look at 28172


 * <G_SabinoMullane> OverlordQ and I are stumped :)


 * <G_SabinoMullane> Would be nice if all the LB stuff was documented somewhere


 * <DJBauch> I see this (Bug 28172)  happen when an install fails to complete for some reason (on SQL   Server   in my case) and then tries to resume. Restarting the install   from   scratch after fixing whatever caused the failure works around   the   problem that the load balancer has been turned off and never   turned  back  on.


 * <^demon> G_SabinoMullane: I have looked at 28172.


 * <^demon> And have never been able to repro it.


 * <G_SabinoMullane> ^demon: Odd. I can dupe quite easily


 * <^demon> It shouldn't differ by dbms, since that code is shared.


 * hmm, and Overlord's repro was like 3 days ago


 * <^demon> And I haven't been able to replicate on mysql or sqlite.


 * <G_SabinoMullane> I think the key is here: #2 /var/www/thedarkcitadel.com/w/includes/User.php(2858): wfGetDB(-1)


 * <^demon> After the db has been initialized the LB is re-enabled.


 * <^demon> And then User can do its thing.


 * <G_SabinoMullane> ^demon: Why is the LB being disabled there?


 * <MaxSem> e.g. to prevent you from creating a user when you shouldn't :)


 * <^demon> The LB code automatically does things like connect to the database and such.


 * <MaxSem> =when DB is not ready yet


 * <^demon> Right, it's to keep you from breaking things accidentally.


 * Conclusion: https://bugzilla.wikimedia.org/show_bug.cgi?id=28172#c10 Sumana took this discussion back to the bug and asked Tim to poke his head in when he has a moment. Tim's response: "If someone can provide a reliable reproduction procedure on PostgreSQL or SQLite, then we can take it from there."


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=22010 SQL error while deleting on SQLite


 * <MaxSem> never was able to repro it outsiide of one page


 * Closed as WFM, meh.


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=31696 update.php fails with SQLite


 * <MaxSem> I would love to see a repro for this bug


 * <MaxSem> looks like the connection is getting closed somewhere


 * I may try to reproduce since I probably face similar / same issues with MSSQL


 * Conclusion: added https://bugzilla.wikimedia.org/show_bug.cgi?id=31696#c13 and DJ will try to reproduce this.


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=28512 SQLite installation via CLI fails to expand ~paths


 * <^demon> Rather than trying to jump through hoops to figure out where the path is, why not just error out when a ~ path is given and say "give a full path please"


 * <G_SabinoMullane> That's some hacky code there. Why are we not just disallowing ~ ?


 * <^demon> G_SabinoMullane: Exactly. It's easier just to say "don't do that" than anything else.


 * 28512: low-priority, someone should fix this eventually, installer issue, move on?


 * Conclusion: prioritized low. https://bugzilla.wikimedia.org/show_bug.cgi?id=28512#c5


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=28281 Differentiate between MySQL and MySQL forks (ie MariaDB)


 * Low priority enhancement -- someone will get to this eventually? Not that big a deal.


 * agreed. Is this in any way *easy* (for newbs) or s


 * <^demon> I dunno what MariaDB would need to do.


 * sumanah: that looks like an issue on the MariaDB side?


 * <^demon> Presumably we'd like to have some way in php to differentiate the two.


 * the only way to fix that is to contribute the patch to them


 * blobaugh: I mean, MariaDB people would donate a bit of time to fix it in MediaWiki.


 * <^demon> Again...


 * <^demon> I'm not sure what MariaDB needs to do?


 * <^demon> We assume it's mysql because we're using mysql_* functions


 * ^demon: it looks like just change the link on the Special:Version page


 * <Platonides> getSoftwareLink currently returns a static string


 * <Platonides> (DatabaseMysql class)


 * <^demon> Platonides: Right. Because we assumed DatabaseMysql would only ever refer to mysql.


 * <Platonides> the work is to somewhow differenciate between the two


 * so to fix it a new class would need to be made? DatabaseMariaDb?


 * <^demon> Platonides: But other than a software link, do we need to differentiate at all?


 * <^demon> ie: Would the subclass be anything other than overriding getSoftwareLink?


 * Not in the code, yet


 * <Platonides> make mysql family servers add a command 'SELECT UPSTREAM_LINK;' :)


 * ha


 * <Platonides> I don't think so


 * tell you what


 * I will write to Colin and ask whether they care


 * ^demon: afaik you are correct


 * <Platonides> if we switched to it, perhaps we would begin optimizing some edge cases


 * <G_SabinoMullane> I thought MariaDB was meant to be super transparent with regards to working MySQL code?


 * <Platonides> but it's similar enough to mysql that I don't think we would make any change in the class


 * Conclusion: Sumana wrote to MariaDB contact, asked whether they care, posted discussion at https://bugzilla.wikimedia.org/show_bug.cgi?id=28281#c2


 * https://bugzilla.wikimedia.org/show_bug.cgi?id=26273 ** Database layer should automagically add GROUP BY columns on backends that need them


 * <^demon> For that, we need to file upstream bugs so all the other DBMSs act like Mysql.




 * <G_SabinoMullane> There is a related bug to 26273 but I don't know where it is offhand. Tim and I made some hand-waving solutions for this.


 * <G_SabinoMullane> Basically, we need to have MW gather the list of columns from the tables and create the GROUP BY on the fly.


 * re https://bugzilla.wikimedia.org/show_bug.cgi?id=26273 -- is this a longterm project for some interested developer, then?


 * like, GSoC level? or intractable? or what?


 * G_SabinoMullane: why not have a var that lists, per table, the correct group by?


 * <G_SabinoMullane> blobaugh: Because schema changes (new columns) would break it


 * <Nikerabbit> not sure if it is worth full GSoC, but looks like non-trivial amount of work indeed


 * This bug (26273) used to rear its ugly head a lot. Not so much any more. It was just one special page the last time I checked. Staying away from GROUP BY 1,2,3, etc. and sticking to the more stringent rules enforced by SQL SERVER (e.g., making sure to name the fields by name rather than by alias) has helped a lot.


 * G_SabinoMullane: could this be a global that gets updated on schema changes manually?


 * <^demon> blobaugh: We can barely keep people updating all the proper places when they do a schema change anyway...a global they'd have to keep in check too would be impossible.


 * ^demon: fair enough


 * <G_SabinoMullane> blobaugh: That seems like a lot of work compared to simply reading the db cols directly


 * G_SabinoMullane: maybe, but it would make the system much faster than needing another db call


 * djbauch: do you think it would be fairly easy to generate a list of the places in MediaWiki that name fields by alias?


 * Sumanah: Yes, there were only two places that caused a problem that I found. One special page and one obscure place.


 * Conclusion: suggest this project to developers who want a substantial MediaWiki improvement project. done in https://bugzilla.wikimedia.org/show_bug.cgi?id=26273#c4

* MSSQL encoding issue? http://mediawikiworker.blogspot.com/2011/10/struggling-with-inexplicable-issue.html


 * I think this problem must be of my own creating? I just don't see it yet. I thought maybe somebody may have seen something similar.


 * <Platonides> sumanah, the question would be: how was that cached version generated?


 * <Platonides> maybe mssql corrupted the gzipped blob containing the cached page


 * Platonides: I have turned off the gzipping of the object cache because I don't get good results when it's turned on. Maybe that's a symptom of the same problem?


 * <DJBauch> What blows me away is difference in rendering between the edit preview and the rendered article. Does this indicate a potential problem with [objectcache]?


 * BTW, I expect to get this solved through shear persistance


 * <brian_swan> djbauch: you know NVARCHAR is ucs-2, right?


 * <^demon> If someone wanted to pick that up again, we stashed it in some branch.


 * <brian_swan> djbauch: http://stackoverflow.com/questions/144283/what-is-the-difference-between-varchar-and-nvarchar


 * I guess that's what I need to address (UCS-2 vs UTF-8).


 * Conclusion: let DJ work on it :-)

Ideas:
Conclusion: it would be great if someone worked on this, but no one has, or is making, the time to do so. This would be a substantial project. Colin Charles writes that to improve how well MediaWiki works on PostgreSQL, look at the example of WordPress, specifically at their SQL file and at db.php. "This is the best bet. Though remember that the wordpress version is possibly also out of date. Some jokers decided to also change the db file to ensure they get mongodb support!" Ben Lobaugh offered to help us get some Windows machines, or a setup in a Microsoft-run Azure cloud, so we can test on Microsoft SQL Server properly. Ryan Lane said, "'d be more than happy for microsoft to manage the windows vms for us; if they offer that, I'll be glad to add windows support to labs." Brion noted: "this page indicates azure has DCs in san antonio, chicago, dublin, amsterdam, singapore, and hong kong -> http://cloudinteropelements.cloudapp.net/Choosing-the-Data-Center-Location-with-Windows-Azure.aspx ... san antonio is probably best for us, but we should check chicago also (from eqiad)"  Ben Lobaugh volunteered to be the lead on this (testing non-MySQL databases in the Wikimedia Labs infrastructure, and additional infrastructures as necessary).
 * Run some production Wikimedia wiki on a non-MySQL database, and have the Wikimedia staff support it
 * Absolutely not -- we have zero reason to support non-MySQL in production WMF sites
 * Design that meta-schema idea that has been kicked around such that we have no more tables.sql anymore (at least not as the canonical source)
 * Yes! Major gains to be had here
 * <^demon> Max and I started doing that in new-installer.
 * <^demon> And it was actually working not half bad.
 * <^demon> But we postponed it so it wouldn't delay 1.17 further.
 * <^demon> If someone wanted to pick that up again, we stashed it in some branch.
 * abstract-schema -- http://svn.wikimedia.org/viewvc/mediawiki/branches/abstract-schema/
 * <^demon> in includes/db/ you'll find Schema and some related classes.
 * <Platonides> ^demon, I don't like it
 * <Platonides> that's not... readable
 * <^demon> Platonides: I'm not married to it, we can scrap it entirely and go another route if we'd like.
 * <G_SabinoMullane> Platonides: Agreed: the one big array is pretty ugly. Sorry, ^demon :)
 * <Platonides> if it read the SQL schema and created the structure on the fly, maybe it could do something
 * (nobody likes it) -- likes the current branch, I think
 * <G_SabinoMullane> On a related note, I wrote something that uses tables.sql from the updater to add new tables and columns, rather than creating all those anoying "patch.sql" files.
 * <G_SabinoMullane> Someday I will clean it up and publish it; but I should look over the abstract-schema first.
 * Fix the groupby issue (one of the more glaring mysqlisms (to be fair, not entirely a mysqlism, but for all intents and purposes is) -- https://bugzilla.wikimedia.org/show_bug.cgi?id=26273
 * Also yes.
 * no one has volunteered to do this.
 * Add other dbs to all aspects of automated tests, both public and private
 * Private what?
 * Improve the documentation
 * Bug 1 ;-)
 * Automated tests need to be done against each db. Should WMF be responsible for maintaining all the supported types of db servers (includes maint, patches, updates, etc) or should the db type dev be responsible for running tests prior to each new MW release?
 * I am still interested in determining how the unit tests are going to be run. Is WMF running each db server or should the db module dev be responsible?
 * <Platonides> I think wmf _should_ provide test runners
 * <Platonides> those labs VM that supposedly are available for everything
 * Platonides: ok, so how do we get things like Oracle going? Is there a good free version?
 * I can help get licenses for SQL Server, but that is only one db
 * <MaxSem> blobaugh, free edition of MSSQL is not enough for tests?
 * MaxSem: I am trying to determine that now, but if MS is willing to donate SQL Server why say no?
 * Sumana notes that we might be able to get donated licenses for Oracle if we want them -- we are a nonprofit and well-known
 * Why deal with Oracle at all when we can download the free version?
 * There may be restrictions and limits that we do not want to risk hitting
 * MaxSem: I am trying to determine that now, but if MS is willing to donate SQL Server why say no?
 * Is it a good idea to try to stick to NVARCHAR rather than VARCHAR columns for just about everything?
 * <^demon> Most everything in mysql is varchar.
 * <Platonides> blobaugh, there's a free version which could be used
 * <^demon> Same with Oracle.
 * <G_SabinoMullane> blobaugh: No, it would only be called once, then cached.
 * G_SabinoMullane: oh, good
 * <G_SabinoMullane> At least, according to Tim. Now it is just a SMOP for someone....
 * Yes, and I tended to leave most things that way, but I see that the other person who has done work on MSSQL made much more heavy use of NVARCHAR than I did. With data stored in UTF-8, it seems like the column should be NVARCHAR? I think VARCHAR BINARY is like NVARCHAR no?
 * alester: do you have any thoughts on the above?
 * no. I shudder in fear for how you guys do cross-DB stuff
 * <brian_swan> Yes.
 * <^demon> alester: We just make it work for mysql && sqlite and hope for the best.
 * Yeah, that's kinda sad. :-)
 * <G_SabinoMullane> I would think nvarchar should be the default
 * BTW, I expect to get this solved through shear persistance
 * But I don't have a better solution. :-)
 * especially with an all-volunteer army.
 * <G_SabinoMullane> No NVARCHAR is not the same as VARCHAR BINARY
 * Platonides: The response I got was that SQL Express will probably work for unit tests, but that I need to talk to another guy to make sure. brian_swan's coworker actually
 * It would probably be better to just get a full license though. Especially since that is what people will run it on. SQL Express is not designed for production
 * <G_SabinoMullane> djbauch: VARCHAR BINARY is used to store raw bytes, e.g. other db's BLOB or BYTEA. Very few tables need that in MediaWiki, generally just the caching stuff.
 * <^demon> G_SabinoMullane: We do use varbinary for our timestamps in mysql ;-)
 * <G_SabinoMullane> ^demon: Yeah, I know. You just can't seem to get timestamps right :)
 * <^demon> G_SabinoMullane: iirc, the reason is largely historical.