Manual talk:Page table

Cannot execute the query: Unknown column 'Mrs._Harford'
I'm trying to access the information but I get an error when I place something in the Page_Title spot

wiki_page_title = 'Mrs._Harford' Cannot execute the query: Unknown column 'Mrs._Harford' in 'where clause'

What could be causing this?

if (! $thiscontent = mysql_query('SELECT old_text from wiki_page, wiki_revision, wiki_text where page_title = "Mrs._Harford" and page_latest = rev_id and rev_text_id = old_id')) { die("Cannot execute the query: ".mysqlerror); } else {echo $thiscontent;}

I figured that the quotes were causing problems, but now I only get Resource Id #3

quotes
when you ask for »"Mrs._Harford"«, you're actually asking for a column named literally »Mrs._Harford«. this is the effect of double quoting: protecting the content from further interpretation.

what you want is the value »Mrs._Harford«, a string typed value. the way to tell this to the sql parser is by using single quotes: »'Mrs._Harford'«.

obviously, since you are writing this in a string being delimited by single quotes itself, you must escape them.

an other way (a good practice) would be to use double quotes to delimit sql commands.

your text would then look like this:

if (! $thiscontent = mysql_query("SELECT old_text from wiki_page, wiki_revision, wiki_text where page_title = 'Mrs._Harford' and page_latest = rev_id and rev_text_id = old_id")) { die("Cannot execute the query: ".mysqlerror); } else {echo $thiscontent;}

Moved from page during rewrite
I moved this from Page table: Titoxd (?!?) 08:47, 5 May 2007 (UTC)

To get e.g. the current wikitext of a given page in the main namespace: SELECT LEFT(old_text,1024) FROM page INNER JOIN revision ON page_latest = rev_id INNER JOIN text ON rev_text_id = old_id WHERE page_title = 'Your page title' AND page_namespace=0; Note: The use of LEFT(old_text,1024) allows the first 1024 characters to be printed because, old_text being a blob, your query would probably only display "[BLOB -  .  KiB]" where  .  is the size in kilobytes of your page content. If you are using this query in a program and need the entire content in old_text, be sure to only reference old_text and not LEFT(old_text,1024).

"varchar(255)" vs. "varchar(255) binary"
Is it really necessary to have "varchar(255) binary" as column type for page_title (and some other columns of this and other tables) instead of "varchar(255)"? I just spent three evenings debugging trying to find out why sqlalchemy (one python ORM framework) does not give me unicode strings anymore (after an update). I finally found out that the "binary" after the varchar(255) caused the problem. I tried to find the "varchar(255) binary" datatype in the MySQL datatype definition documentation but it is not described (as it was no supported datatype): http://dev.mysql.com/doc/refman/5.5/en/create-table.html. I guess that this datatype, as MySQL does not complain when creating a table using it, is either a hidden feature of MySQL or a relict.

I think that the motivation making the page_title column "varchar(255) binary" was to avoid encoding problems with lating1 databases having to store UTF8. Wouldn't it be better to use CHARACTER SET UTF8 in the CREATE TABLE statement?


 * I am also investigating that. I found the following discussion.
 * MySQL 5.0 or 5.1 does not support all UTF-8 characters (called the four-byte utf-8). MySQL 5.5 introduced a character set utf8mb4 which solves the issue.
 * But it seems that for 5.0 and 5.1, a workaround is to add "binary". --Kipmaster 15:46, 30 November 2011 (UTC)
 * ... more on this...
 * I couldn't get "varchar(255) binary" on my system either, but it seems that it is the same as varbinary(255); --Kipmaster 21:09, 30 November 2011 (UTC)

page_is_redirect vs redirects table
The documentation for the page table states:

to find pages that are not redirects add "p.page_id not in (select rd_from from wikidb.redirect)"

Yet, the page_is_redirect column in the page table is supposed to be 1 if the page is a redirect, and 0 otherwise.

Running some queries on the database, one can see that there are big discrepancies between the redirect table and the page_is_redirect column.

For example, using the English Wikipedia dump from April 2011, if we query the number of pages whose page_is_redirect attribute is set to 1, but whose page_id does not appear in the redirect table, we get 2303 such pages:

mysql> SELECT COUNT(page_id) FROM page WHERE page_is_redirect = 1 AND NOT page_id IN (SELECT rd_from FROM redirect); ++ | COUNT(page_id) | +- |           2303 |  ++  1 row in set (16.97 sec)

If we then search for all pages who page_is_redirect attribute is set to 0, but whose page_id does appear in the redirect table, we get 368269 such pages:

mysql> SELECT COUNT(page_id) FROM page WHERE page_is_redirect = 0 AND page_id IN (SELECT rd_from FROM redirect); ++ | COUNT(page_id) | +- |         368269 |  ++  1 row in set (19.78 sec)

It would seem, then, that the only reliable way to determine all redirect pages would be to scan the text table, look at the Wikitext for the latest revisions of all pages, and determine whether or not they contain the string "#REDIRECT". Due to the size of this table, this will take quite awhile, so I'm hoping for a better / quicker way to determine whether or not a page is a redirect. Can I trust the redirect table, even though some pages are not listed in it but have their page_is_redirect attribute set to 1? Please advise. Thank you! 129.100.19.21 21:52, 9 June 2011 (UTC)

This was a bug in MWDumper (38919). --Tgr (talk) 10:04, 31 October 2012 (UTC)