Manual talk:Categorylinks table

Suggested Database Engines
What are the suggested mysql database engines for this table? Wissenslogistiker 15:55, 5 March 2006 (UTC)
 * InnoDB as for nearly all tables in a MediaWiki database. --88.130.126.202 02:01, 25 January 2012 (UTC)

cl_type
This field is not currently documented. Incnis Mrsi 13:37, 3 January 2012 (UTC)
 * Added. --88.130.69.106 13:41, 30 January 2012 (UTC)
 * Added. --88.130.69.106 13:41, 30 January 2012 (UTC)

Sortkeys in database vs. sortkeys provided in category links
I think the Fields section here needs to clarify the connection between sortkeys used in the database (cl_sortkey and cl_sortkey_prefix) and sorkeys provided by wiki editors in category links of the form  sortkey. I've just edited the lead to (hopefully?) lay the groundwork for this, but I don't know if my changes to the Fields section would even be totally correct, since I only understand category links from a wiki editor's perspective). Can someone please clarify whether the parenthetical remark "aka the sortkey is unspecified" in the description of cl_sortkey_prefix is referring to user-supplied sortkeys (after the pipe in category links) and whether "Otherwise it is the human readable version of cl_sortkey" means that it contains the literal user-supplied sortkey text? - dcljr (talk) 07:39, 22 March 2012 (UTC)

Extracting category information by simple text searching of the sql dump file
I've been wanting to get a set of categories for each page of an entire wiki using the sql categorylinks database dump. That's helpful, e.g. when identifying the licensing for a given image file wikimedia commons. Instructions on the internet for using these SQL dumpfiles assume you have a running mySQL database to load the dumpfiles into. But this is slow and inconvenient if you just want to grab a basic list of categories.

To help extract data from these wikimedia sql dump files, I've cobbled together the following perl script. It extracts e.g. the current version of the wiki commons dump (http://dumps.wikimedia.org/commonswiki/20130216/commonswiki-20130216-categorylinks.sql.gz) and prints the categories out to a text file. This file has the category names listed on the line corresponding to the page_id, for easy access. For example, the 2 millionth line in the output, when run on this categorylinks file is:

2000000:'Narda','Our_Lady_of_Sorrows_churches_in_Hungary','PD-self','Roman_Catholic_Churches_in_Vas_County'

Showing the categories for page_id 2000000 (http://commons.wikimedia.org/wiki/File:Kisnarda01.jpg).

This script takes under an hour to process the 18GB commons file on my old computer. I suspect it can also be sped up by some clever perl programmer. Perhaps other people will find such a script useful? It is rather dependent on the exact layout of the sql dump file, though.

HYanWong (talk) 02:28, 1 March 2013 (UTC)


 * And some equivalent C code that does it in 10 mins
 * HYanWong (talk) 14:23, 1 March 2013 (UTC)