User:Kmenger/Tool Labs

From mediawiki.org

The etherpad for the Tool Labs doc sprint, July 23-25, 2013 is here: http://etherpad.wmflabs.org/pad/p/Tool_Labs_Sprint_July_23

The draft of the new Tool Labs guide: User:Kmenger/ToolLabsGuide

TODO for improvingTool Labs documentation [edit]

New questions for Coren

7.2 Connecting to the database replicas Well, this is kind of an old question, but I'm still confused about these database aliases, and the examples given are inconsistent. Currently, two examples for connecting to enwiki database are given:

1. mysql --defaults-file="${HOME}"/replica.my.cnf -h enwiki.labsdb enwiki_p 

and (from the FAQ)

2. mysql --defaults-file=~/replica.my.cnf -h enwiki.labsdb

[The FAQ example does not include the 'enwiki_p'] [MAP: because that example connects to the right cluster without connecting to a specific database, that's what you would do if you wanted to create a database on the same cluser as enwiki's]

The other examples (to wikidatawiki and commonswiki.labsdb) do not include the database alias (wikidatawiki_p, etc) either. Is example 1 a mistake? Or is the alias here important/should we update the other example?

[MAP: It depends on context. For instance, example 1 is strictly equivalent to:

mysql --defaults-file="${HOME}"/replica.my.cnf -h enwiki.labsdb

followed (in mysql) by:

connect enwiki_p

so it really depends on what the user wants to do (connect to the cluster vs. connect to a specific database on that cluster)]

Section 7.4 Joins between commons and wikidata and other project databases This is still TK

Section 8.4.2 Using ‘qsat’ to return status info There is a ‘job number’ and a ‘job id’, which appear to be the same (at least in the simple examples I’ve looked at). Are they in fact the same thing? If not, how do they differ?

[MAP: They are the same, the job id is a number]


Section 9.5.1 Running scripts http://www.mediawiki.org/wiki/User:Kmenger/ToolLabsGuide#Running_scripts “The web server allows overrides of AuthConfig DirectoryIndex FileInfo Options=IncludesNOEXEC from .htaccess.” --When/why would you do this/override each of these settings?

[MAP: this is one of those cases where "if you need to use one of the overrides, you should already know what they are for". We could point at the Apache documentation, but each of those overrides have (possibly numerous) effect in how the web server processes the requests in the directory where they are found. AuthConfig overrides, for instance, enable a number of directives used for authentication.]


Section 9.6 PyWikipediaBot Seems there are at least three ways to set this up: cloning pywikibot framework to the tool directory (as shown in 9.6.1), using the snapshot at /shared/pywikipedia/rewrite (section 9.6.2), or using virtualenv (9.6.3). Should we recommend one way in particular/just show step-by-step instructions for it? If so, which set up is the one we should feature? Also, do we want to recommend that users clone spelling (one set of instructions does/the other doesn’t—evidently spelling is large)? Also, it’s kind of confusing—this renaming of rewrite/to ‘core’ and trunk/ to ‘compat’ (that’s more of an observation than a question, but I figured I’d throw it out there).

[MAP: Outside my field of expertise. I should expect that one of our devs more familiar with Python in general and pywikibot specifically could provide help. I can guess at some of the answers, but you'd better get it from the horse's mouth]


Section 10 Redis Maybe this is totally obvious to people who will use this, but I was wondering if it would be useful to note where the libraries are/add an example of this. I point to SuchABot. Is that okay to do?

[MAP: It may be useful to give more detailed information on specific libraries, but I think that should really be spun off in a separate tutorial since it'd be a howto that's very much not Tool Labs-specific]


Questions and initial notes, with some additional notes from July 15 meeting

- Would love to add a more detailed overview (features, rationale, basic architecture,open-source rule, etc) as an introduction, based on this really helpful presentation:http://www.mediawiki.org/wiki/File:Tool_Labs_presentation_(Hackaton_2013).pdf. High-level examples of tools and bots (e.g., edit counter/ edit bot) and where one encounters them was also very helpful to me. 

[Add overview here: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools

- I don’t fully understand the difference between tools and toolsbeta. What types of development would go where? How do I choose which project to use? Are there different ways of accessing each? Are there things that should go on one, andnot the other?  

  • toolsbeta is for experiments in the tool labs environment itself; things like new systems entirely, or when you need experimental versions of system libraries that could affect other users. In general, every tool maintainer should work primarily on tools, only doing work on toolsbeta when changes to the tool labs need to be tested to support their tool.  

-The instructions for creating a key/requesting account are very helpful! I got confused when I tried to ssh in. 

-The instructions say 24 hours or less to process a Tool Labs account request. I waited about 24 hours before asking for help on IRC.  

  • The delay is generally less than a day; but it does depend on admin availability (i.e.: either one of the ops taking care of labs or one of the project admins needs to handle it).  

-What are the rules for using bots? I wondered this and was/am worried about running mybot/changing anything. I would have appreciated some advice on this matter.

[MORE INFO:http://meta.wikimedia.org/wiki/Botand Wikipedia bot approval group for more info: http://en.wikipedia.org/wiki/Wikipedia:Bot_Approvals_Group]  

SEE ALSO: https://meta.wikimedia.org/wiki/Bots specifically https://meta.wikimedia.org/wiki/Bot_policy 

  • The rules for running a bot on the Labs are fairly liberal (while still in draft, they sum up as "don't break anything"). The rules for running bots on projects are decided by the project themselves and are not part of the Labs rules (although we will certainly have a provision that you should abide the rules of every project you interact with). 

-Add a 'Contact us' section with  irc/mailing list/bugzilla info (https://bugzilla.wikimedia.org/enter_bug.cgi?product=Wikimedia%20Labs). Any other points of contact? 

-This info is in powerpoint slides, but not the guide, I believe, but it seems important to note: 'There are two bastionhosts: Ssh://tools-login.wmflabs.org Ssh://tools-dev.wmflabs.org This is whereusers log in and do their thing. They are functionally identical, but we request that heavy processing nottake place on –login (compiles, etc) to keepinteractive performance snappy.'

-Can you commit code to git via gerrit? If so, how? Magnus notes that WMF Labs does not offer aneasy-to-use code repository and offers other suggestions: https://wikitech.wikimedia.org/wiki/User:Magnus_Manske/Migrating_from_toolserver#GIT Is there a recommended way/set up?

  • WMF's gerrit repo works from the labs without difficulty, but tool maintainers have to ask for a project to be created for their tools (there is no provision to create them automatically)

-Creating a new tool account. Seems there are two ways to do this: From Tool Labs Webpage> Create tool [1]Or from https://wikitech.wikimedia.org/wiki/Special:NovaProject>add service group. Are there any differences between these? Can a new tool account only be created via the web interface? Also, adding/removing users to account. Also only through web ui?    

  • They both point to the same place.    

-Am thinking it would be helpful to add an “Accessing Tool Labs section” to consolidate the info about this (web, ssh, putty/winscp, sftp<—a user mentions that this worksand is currently undocumented, GUI editor )    

-The first thing I encountered/wondered about when I accessed my account was the replica.my.cnf file. I figured out what it is/how it is used from the database access section, but I also liked the idea of mentioning this information in the ‘creating a new tool account’ section as that is the first place I looked.      

-Thinking that it might be nice to create a section along the lines of ‘Configuring a tool account’ with info about creating a tool description/webpage, mail, adding/removing users, maybe the pywikibot set up, etc. some of this info is in the doc, but it gets a little lost in sections dedicated to other topics.    

-Russell Blau’s Using pywikibot on Labs doc is really helpful! https://wikitech.wikimedia.org/wiki/User:Russell_Blau/Using_pywikibot_on_Labs Are there other common tool set ups that could/should be documented?    

-That said, re:Blau, although I was able to successfully set a path variable, create a sub-directoryin my home directory for bot-related files, add my user-config.pyfile, and runa test (step 1-4) I wasn’t able to run my simple test bot, which is (sans all paths): pythonpwb.py basic-dry User:KBot500    

I tried several ways (e.g., with fully qualified path to pwb.py (e.g.,/shared/pywikipedia/rewrite/pwb.py), full paths to both pwb and basic. I experimented with the symbolic link as well. I managed to run the bot from my local machine, with the user-config.py I added to Tool Labs.    

-I haven’t yet been able to test out the grid access. Are there some bots/tools that I can practice with/look at? If so, what’s the best way to get these? A shared ‘Examples’ Tools account or something along those lines? Something else? I'd love to see examples of bots that one would run continuously, and tools that create/access databases in different ways.     

  • UPDATE: From IRC, YuviPanda and logoktm recommend setting up virtualenv and installing pwb to that.     

**** This is as far as I’ve gotten in using tool labs. More as it comes!     

TODO: Search should get Toolsbeta information.

TODO: Do the rules for running a bot on the Labs exist on wiki?

TODO:  on wikitech home page/Usage: Cllicking Number of projects will show labs projects.

TODO: search for tools beta or toolsbeta gets you to info on toolsbeta

My trouble running pywikibot on Tool Labs[edit]

UPDATE: From IRC, YuviPanda and logoktm recommend setting up virtualenv and installing pwb to that.      

Steps followed and subsequent error: 

1.     I followed Blau’s guide to setting up pywikibot on Labs: https://wikitech.wikimedia.org/wiki/User:Russell_Blau/Using_pywikibot_on_Labs. I used a user-config.py that works on my system and successfully ran: 

local-kirstentest@tools-login:~/.pywikibot$python /shared/pywikipedia/rewrite/scripts/version.py 

2.     I tried to run my bot, which I ran locally with no trouble as:  python  pwb.py basic -dry User:KBot500.  To run it in Labs, I put in full paths: 

local-kirstentest@tools-login:~/.pywikibot$python /shared/pywikipedia/rewrite/pwb.py /shared/pywikipedia/rewrite/scripts/basic -dry User:KBot500   

I got the following error: Traceback (most recent call last):  File"/shared/pywikipedia/rewrite/pwb.py", line 35, in <module>   execfile('generate_user_files.py')IOError: [Errno 2] No such file or directory:'generate_user_files.py'   

Feedback on bots installing/configuring, and running pywikibot: [2][edit]

-the python on my mac laptop (approx.. three years old) was too old for pywikibot. Not just pc that might need to download python!   

-the magic keywords in the presentation are incomplete. I found the full set in an older version of the presentation:http://upload.wikimedia.org/wikipedia/mediawiki/archive/9/9b/20130526081606%21Bots_hackathon_2013.pdf