Topic on Extension talk:DynamicPageList (Wikimedia)

Performance concerns regarding the Intersection extension

10
Sumanah (talkcontribs)

I see in this bug comment and others that there are performance concerns regarding Extension:Intersection; Tim Starling said a year ago, "We don't want to enable it on any wikis which have a lot of pages, since there is a risk of DB overload due to extremely long-running queries." Is this still the case, or would it be have the performance concerns been addressed?

Also, it's a bit difficult for me to find more thorough discussion of these performance issues, and a link would be appreciated. Thanks.

Bawolff (talkcontribs)

Yes this is still the case.

The performance concerns would be difficult to address (or at least beyond my ability) unless we store category membership information in a different type of data structure (aka something other than the db).

I don't think there were any discussions about this. Its fairly self-obvious if you look at the code. (People like to point at it being enabled at en.wikitonary for how it can't be that bad, but I think the fact its enabled there is more a historical accident than anything else [that is of course just my opinion]. One could easily make queries there that would take longer to complete then the timeout for rendering a page.

Bawolff (talkcontribs)

p.s. I should mention WM-DE is working on a project which may have implications to making this extension reasonably efficient (That's not their primary goal, but it may help this extension).

He7d3r (talkcontribs)
Bawolff (talkcontribs)

No i mean catgraph/graphcore . Note, that their goal is to make the toolserver catgraph thingy go faster, they weren't really concerned about DPL, but i think that DPL could possibly be integrated with that maybe (I say that without looking at code whatsoever. At the very least it would require a lot of refactoring of DPL).

He7d3r (talkcontribs)

Oh, good to know of that. Thanks for the link :-)

Daniel Kinzler (WMDE) (talkcontribs)

Yes, catserv could be used to make category intersection efficient. It was indeed designed for that. There's a PHP client library, and some demo code that shows how to do category intersections and other filtering. That par4t is pretty streight forwards. But:

1) Graphserv is a standalone graph db engine. It's a custom build c++ software, wich would have to be reviewed, deployed and maintained. And while I have done quite a bit of testing with it, and i'm pretty happy with the results, it hasn't seen any long term / heavy load testing yet.

2) For Graphserv to be useful, it needs to be up to date - i.e. whenever a categorylink entry is made or removed in mysql, this needs to be done in graphserv as well. This isn't totally trivial.

Actually, I think 2) should actually be implememented as a separate extension (one that just keeps a graphserv instance in sync with the mediawiki db). DPL could then use graphserv if it's present.

Bawolff (talkcontribs)

For 2, I was assuming that a simple hook into LinksUpdateComplete could perhaps do it. That way whenever cat gets updated, so would graphserv (I again say this without even having looked at graphserv code once)

Sumanah (talkcontribs)

Right now, I am inferring that no one has specific plans to make those improvements in Graphserv and so on that would be necessary to move forward on deploying the Intersection (DPL) extension to more WMF wikis. So I'm going to remove this extension from Deployment queue. Thanks for the discussion!

Gryllida (talkcontribs)
  1. Who maintains GraphServ?
  2. Would storing categories in Wikidata be useful here?
  3. This query suggests to only use recentchanges table. Is that useful?
Reply to "Performance concerns regarding the Intersection extension"