User talk:Legoktm/GSoC 2013

There's one part of this project that maybe wasn't clear in the project description and so isn't addressed in your plan. The format of the *fulls* would have to be changed to, and a script to write the old xml format from the new fulls should be written. The reason for this is that we already do get (mostly) only new information from the database servers, but the time consumed in reading the old xml files, uncompressing them, doing integrity checks to make sure it's probably good content, and copying it to the new full takes several days for en wp even with 27 processes running to do it. The current format needs to die a horrible death and be replaced with something better. Can you incorporate this in your proposal? -- ArielGlenn (talk) 07:28, 24 April 2013 (UTC)

Thanks for the clarification. I've worked it into the proposal and will add it into the schedule soon. Legoktm (talk) 08:42, 24 April 2013 (UTC)


This is mostly unrelated to your proposal, but when I think of the database dumps we currently provide, I think of two issues:

  • XML is an awful format and should be switched to something better (such as JSON); and
  • SQL dumps are currently missing a number of important tables (e.g., the user table).

I don't know if either of these issues could be addressed by your work, but if so, that would be wonderful! --MZMcBride (talk) 16:20, 25 April 2013 (UTC)