Talk:Wikimedia technical search

Original comments from Nike's blog

 * Copied here for future reference; originally from laxstrom.name/blag

@Nemo, you should place the list of URLs for the custom search engine somewhere public, preferably in source control (github?) so people could contribute additions / tweaks.

Specifically, I’d love if the search included blog posts from Planet Wikimedia and Open Planet Wikimedia. Also, technical village pumps and perhaps template talk pages would be good places to search, also.

Finally, it seems obvious that code itself should be searchable (e.g. single-line comments that don’t make it to doxygen, or even class/function/variable names, etc.

Waldir 14:22, 11 February 2013 (UTC)


 * @Waldir: I’ve just placed the list on the wiki: there isn’t any way to sync it automatically, is there? Planets can’t be searched, they’re ephemeral resources without archives so you’d be searching only the last few days, is this ok? Technical village pumps and template_talk namespaces are quite a challenge but I guess they may be included if someone made a list. Or you can add them yourself, I’ve added you as admin.
 * Nemo 17:27, 11 February 2013 (UTC)


 * I never tried creating a custom Google search engine before, but I doubt there's an automatic way to keep it up to date, which is a shame. I've heard that Yahoo's BOSS is quite powerful, but I don't know whether it allows automatic updating either (e.g. from a text file somewhere public, a git repo, etc.). It might be interesting to try out nevertheless.
 * As for the planets, that's quite a shame. I wonder if we should then include the urls of the blogs included in the planet(s). The blogs are quite an important resource.
 * I see that code search is already included. So that leaves, from my suggestions:
 * blog posts from Planet Wikimedia (needs to be regularly updated, from here)
 * Technical village pumps
 * Template talk pages
 * And a new one I just thought of:
 * Signpost Technology Reports (including possible previous URL formats using the BRION nomenclature)
 * What else? --Waldir (talk) 21:25, 11 February 2013 (UTC)
 * Maybe there's a way to create a mirror of the planet which is easier to update. Could be just a feedburner or Google reader or whatever, maybe: do you know some such services? Updating hundreds of entries by hand doesn't look fun, unless you think it's worth adding only a smallish subset of tech-oriented blogs. --Nemo 06:38, 12 February 2013 (UTC)

Gitweb not indexable
https://www.google.com/search?q=site:https://gerrit.wikimedia.org/r/gitweb returns a single result, with the following non-description: "A description for this result is not available because of this site's robots.txt". What purpose does that setting serve? --Waldir (talk) 21:29, 11 February 2013 (UTC)
 * I asked on IRC. Apparently this was for performance reasons (gitweb couldn't cope with crawlers). Possibly the upcoming update to gitblit will make things better. Logs of that conversation should be available here sometime from now; timestamps approx 21:32 --> 21:42. --Waldir (talk) 21:44, 11 February 2013 (UTC)
 * It's enough to look in ^demon's talk, where I asked the same question. ;) I had added it "just in case". --Nemo 06:38, 12 February 2013 (UTC)

Bugzilla not indexed
See http://www.google.com/search?q=site:bugzilla.wikimedia.org -- it says HTTPS has to be used. http://www.google.com/search?q=site:https://bugzilla.wikimedia.org doesn't do any better. --Waldir (talk) 22:52, 11 February 2013 (UTC)
 * This is weird, it used to work. If Gmane works, however, it would be a duplicate of wikibugs. --Nemo 06:38, 12 February 2013 (UTC)

Sandbox
Less user-friendly, but immediate for testing purposes: https://www.google.com/search?q=site:mediawiki.org+OR+site:github.com/wikimedia/ --Waldir (talk) 22:52, 11 February 2013 (UTC)

Duplicates

 * Why were duplicates added, like  and its subset  ? I think the wildcard crosses directories and everything.
 * Gmane is not crawled, but some of its subdomains indeed are (comments.gmane.org, perhaps permalink, what else?). I'd use only Gmane.
 * mediawiki-cvs/mediawiki-commits was added to search commit messages and code review comments, doesn't github add duplicates? --Nemo 06:38, 12 February 2013 (UTC)