Manual talk:Page table

Jump to navigation Jump to search

Cannot execute the query: Unknown column 'Mrs._Harford'[edit]

I'm trying to access the information but I get an error when I place something in the Page_Title spot

wiki_page_title = 'Mrs._Harford' Cannot execute the query: Unknown column 'Mrs._Harford' in 'where clause'

What could be causing this?

if (! $thiscontent = mysql_query('SELECT old_text from wiki_page, wiki_revision, wiki_text where page_title = "Mrs._Harford" and page_latest = rev_id and rev_text_id = old_id')) { die("Cannot execute the query: ".mysqlerror()); } else {echo $thiscontent;}

I figured that the quotes were causing problems, but now I only get Resource Id #3


when you ask for »"Mrs._Harford"«, you're actually asking for a column named literally »Mrs._Harford«. this is the effect of double quoting: protecting the content from further interpretation.

what you want is the value »Mrs._Harford«, a string typed value. the way to tell this to the sql parser is by using single quotes: »'Mrs._Harford'«.

obviously, since you are writing this in a string being delimited by single quotes itself, you must escape them.

an other way (a good practice) would be to use double quotes to delimit sql commands.

your text would then look like this:

if (! $thiscontent = mysql_query("SELECT old_text from wiki_page, wiki_revision, wiki_text where page_title = 'Mrs._Harford' and page_latest = rev_id and rev_text_id = old_id")) { die("Cannot execute the query: ".mysqlerror()); } else {echo $thiscontent;}

Moved from page during rewrite[edit]

I moved this from Page table: Titoxd(?!?) 08:47, 5 May 2007 (UTC)

To get e.g. the current wikitext of a given page in the main namespace:

   INNER JOIN revision ON page_latest = rev_id 
   INNER JOIN text ON rev_text_id = old_id
   page_title = 'Your page title' 

Note: The use of LEFT(old_text,1024) allows the first 1024 characters to be printed because, old_text being a blob, your query would probably only display "[BLOB - <n>.<n> KiB]" where <n>.<n> is the size in kilobytes of your page content. If you are using this query in a program and need the entire content in old_text, be sure to only reference old_text and not LEFT(old_text,1024).

"varchar(255)" vs. "varchar(255) binary"[edit]

Is it really necessary to have "varchar(255) binary" as column type for page_title (and some other columns of this and other tables) instead of "varchar(255)"? I just spent three evenings debugging trying to find out why sqlalchemy (one python ORM framework) does not give me unicode strings anymore (after an update). I finally found out that the "binary" after the varchar(255) caused the problem. I tried to find the "varchar(255) binary" datatype in the MySQL datatype definition documentation but it is not described (as it was no supported datatype): I guess that this datatype, as MySQL does not complain when creating a table using it, is either a hidden feature of MySQL or a relict.

I think that the motivation making the page_title column "varchar(255) binary" was to avoid encoding problems with lating1 databases having to store UTF8. Wouldn't it be better to use CHARACTER SET UTF8 in the CREATE TABLE statement?

I am also investigating that. I found the following discussion [1].
MySQL 5.0 or 5.1 does not support all UTF-8 characters (called the four-byte utf-8). MySQL 5.5 introduced a character set utf8mb4 which solves the issue [2].
But it seems that for 5.0 and 5.1, a workaround is to add "binary". --Kipmaster 15:46, 30 November 2011 (UTC)
... more on this...
I couldn't get "varchar(255) binary" on my system either, but it seems that it is the same as varbinary(255); --Kipmaster 21:09, 30 November 2011 (UTC)

When I import this I get titles in the form of '41666768616e697374616e50656f706c65'. How do i fix it so it is UTF-8 imported?

page_is_redirect vs redirects table[edit]

The documentation for the page table states:

 to find pages that are not redirects add "p.page_id not in (select rd_from from wikidb.redirect)"

Yet, the page_is_redirect column in the page table is supposed to be 1 if the page is a redirect, and 0 otherwise.

Running some queries on the database, one can see that there are big discrepancies between the redirect table and the page_is_redirect column.

For example, using the English Wikipedia dump from April 2011, if we query the number of pages whose page_is_redirect attribute is set to 1, but whose page_id does not appear in the redirect table, we get 2303 such pages:

 mysql> SELECT COUNT(page_id) FROM page WHERE page_is_redirect = 1 AND NOT page_id IN (SELECT rd_from FROM redirect);
 | COUNT(page_id) |
 |           2303 |
 1 row in set (16.97 sec)

If we then search for all pages who page_is_redirect attribute is set to 0, but whose page_id does appear in the redirect table, we get 368269 such pages:

 mysql> SELECT COUNT(page_id) FROM page WHERE page_is_redirect = 0 AND page_id IN (SELECT rd_from FROM redirect);
 | COUNT(page_id) |
 |         368269 |
 1 row in set (19.78 sec)

It would seem, then, that the only reliable way to determine all redirect pages would be to scan the text table, look at the Wikitext for the latest revisions of all pages, and determine whether or not they contain the string "#REDIRECT". Due to the size of this table, this will take quite awhile, so I'm hoping for a better / quicker way to determine whether or not a page is a redirect. Can I trust the redirect table, even though some pages are not listed in it but have their page_is_redirect attribute set to 1? Please advise. Thank you! 21:52, 9 June 2011 (UTC)

This was a bug in MWDumper (bugzilla:38919). --Tgr (talk) 10:04, 31 October 2012 (UTC)

page table = "core of the wiki"?[edit]

I would think that the text, page, and revision tables together are the core of the wiki. Leucosticte (talk) 02:08, 7 October 2013 (UTC)

Indeed. And, if anything, page is the one of the three you need the least, too - all it has that the others don't is titles. -— Isarra 08:14, 9 October 2013 (UTC)

Two requests[edit]

  • Could someone please fix the link includes/Defines.php in section page_namespace which redirects to phabricator's general search page.
  • Could someone please add a short definition of new to the section page_is_new.

Thank you very much. --Achim (talk) 18:20, 27 May 2016 (UTC)

How's that? Robin Hood  (talk) 21:04, 27 May 2016 (UTC)
Robin Hood, fine, thank you. --Achim (talk) 07:58, 28 May 2016 (UTC)

No Text Table[edit]

The text of the page itself is stored in the text table.

My wiki does not contain a text table. Did this architecture change? Johnywhy (talk) 16:13, 9 June 2018 (UTC)

Well, that's impossible. Even if you have configured External Storage, there's still a text table. --Ciencia Al Poder (talk) 09:36, 10 June 2018 (UTC)

Found it. But the only fields i see are old_id, old_text, and old_flags. That doesn't sound like live data. — Preceding unsigned comment added by Johnywhy (talkcontribs)

Those column names have its history, but anyway, documentation exists so you don't have to infer where or how things are stored by looking at how they're named. --Ciencia Al Poder (talk) 09:25, 12 June 2018 (UTC)

How to query live pages?[edit]

Is this correct selection to get only live pages?

This query returns results, but seems to be missing some pages (based on my wiki content):

SELECT page.page_title FROM `text`, `page` WHERE `old_text`LIKE '%test%' AND 
page.page_latest = text.old_id

This also works. Wondering how to check if index is up-to-date.

SELECT page_id, page_namespace, page_title 
FROM `page`,`searchindex`
WHERE page_id=si_page
AND MATCH(si_text) AGAINST('+''gross''' IN BOOLEAN MODE)
AND page_is_redirect=0

— Preceding unsigned comment added by Johnywhy (talkcontribs)

This works only if page text is not compressed (Manual:$wgCompressRevisions). --Ciencia Al Poder (talk) 09:17, 13 June 2018 (UTC)

Which query? Which table? searchindex is the table being searched, and that's never compressed. Correct? Johnywhy (talk) 12:15, 13 June 2018 (UTC)

Text table will be compressed if the setting is set to true. --Ciencia Al Poder (talk) 19:39, 14 June 2018 (UTC)

Is there a way to get the most viewed pages list?[edit]

In former mediawiki versions I used to get the most viewed pages list with the mysql query:

SELECT DISTINCT page_title FROM page WHERE page_namespace=0 AND page_counter >= 0 ORDER BY page_counter DESC;

The page_counter field is not there any more. Is there a way to accomplish the task I were used to?

DonPaolo (talk) 14:17, 30 October 2018 (UTC)