Talk:Wikimedia technical search

Original comments from Nike's blog

 * Copied here for future reference; originally from laxstrom.name/blag

@Nemo, you should place the list of URLs for the custom search engine somewhere public, preferably in source control (github?) so people could contribute additions / tweaks.

Specifically, I’d love if the search included blog posts from Planet Wikimedia and Open Planet Wikimedia. Also, technical village pumps and perhaps template talk pages would be good places to search, also.

Finally, it seems obvious that code itself should be searchable (e.g. single-line comments that don’t make it to doxygen, or even class/function/variable names, etc.

Waldir 14:22, 11 February 2013 (UTC)


 * @Waldir: I’ve just placed the list on the wiki: there isn’t any way to sync it automatically, is there? Planets can’t be searched, they’re ephemeral resources without archives so you’d be searching only the last few days, is this ok? Technical village pumps and template_talk namespaces are quite a challenge but I guess they may be included if someone made a list. Or you can add them yourself, I’ve added you as admin.
 * Nemo 17:27, 11 February 2013 (UTC)


 * I never tried creating a custom Google search engine before, but I doubt there's an automatic way to keep it up to date, which is a shame. I've heard that Yahoo's BOSS is quite powerful, but I don't know whether it allows automatic updating either (e.g. from a text file somewhere public, a git repo, etc.). It might be interesting to try out nevertheless.
 * As for the planets, that's quite a shame. I wonder if we should then include the urls of the blogs included in the planet(s). The blogs are quite an important resource.
 * I see that code search is already included. So that leaves, from my suggestions:
 * blog posts from Planet Wikimedia (needs to be regularly updated, from here)
 * Technical village pumps
 * Template talk pages
 * And a new one I just thought of:
 * Signpost Technology Reports (including possible previous URL formats using the BRION nomenclature)
 * What else? --Waldir (talk) 21:25, 11 February 2013 (UTC)

Gitweb not indexable
https://www.google.com/search?q=site:https://gerrit.wikimedia.org/r/gitweb returns a single result, with the following non-description: "A description for this result is not available because of this site's robots.txt". What purpose does that setting serve? --Waldir (talk) 21:29, 11 February 2013 (UTC)
 * I asked on IRC. Apparently this was for performance reasons (gitweb couldn't cope with crawlers). Possibly the upcoming update to gitblit will make things better. Logs of that conversation should be available [bots.wmflabs.org/~wm-bot/logs/%23wikimedia-tech/201302.txt here] sometime from now; timestamps approx 21:32 --> 21:42. --Waldir (talk) 21:44, 11 February 2013 (UTC)

Bugzilla not indexed
See http://www.google.com/search?q=site:bugzilla.wikimedia.org -- it says HTTPS has to be used. http://www.google.com/search?q=site:https://bugzilla.wikimedia.org doesn't do any better. --Waldir (talk) 22:52, 11 February 2013 (UTC)

Sandbox
Less user-friendly, but immediate for testing purposes: https://www.google.com/search?q=site:mediawiki.org+OR+site:github.com/wikimedia/ --Waldir (talk) 22:52, 11 February 2013 (UTC)