Evaluating and Improving MediaWiki web API client libraries/Status updates

5 June 2014

 * What is it to be Pythonic?
 * http://www.dirtsimple.org/2004/12/python-is-not-java.html
 * http://blog.startifact.com/posts/older/what-is-pythonic.html
 * http://www.cafepy.com/article/be_pythonic/\
 * Python style guide: http://legacy.python.org/dev/peps/pep-0008/
 * Docstring conventions: http://legacy.python.org/dev/peps/pep-0257/
 * Wikimedia diversity conference notes: http://adainitiative.org/2013/11/wikimedia-diversity-conference/
 * Wikimedia diversity conference notes: http://adainitiative.org/2013/11/wikimedia-diversity-conference/


 * Evaluating simplemediawiki:
 * wikidata API works better when you know it's http://www.wikidata.org/w/api.php not http://wikidata.org/w/api.php <--- WHY IS THIS INCONSISTENT
 * NEVER MIND, need to have www.mediawiki.org/w/api.php too... automatic browser redirects strike again
 * bug in the wikidata api: new continue param doesn't work with wbsearchentities, see: http://www.wikidata.org/w/api.php?action=wbsearchentities&search=abc&language=en&continue=&format=json yields
 * in contrast, see https://en.wikipedia.org/w/api.php?action=query&list=allcategories&acprefix=List%20of&continue=&format=json

3 June 2014
Notes from talking with Yuri and Sumana: suggests: pretend everything's old
 * API is written like SQL; amazing but not really cacheable.
 * APIs--shopping cart. Good for bandwidth, direct optimization, one request.
 * or: cacheing technologies, work per request.
 * google: no per-request cacheing, ok.
 * but MWiki: every person sees same content (ish--mod stuff like individual gender).
 * preferences do mean that not-one-content-fits-all.
 * API: more suited to non-localized/non-gendered.
 * but content/metadata is less fragmentable, *should* be highly cacheable.
 * wants to make more "blobbable": blobs with keys
 * Varnish: reverse proxy, cacheing mech between backend and user
 * Tollef used to work on Varnish.
 * [read performance guidelines that Sumana's been drafting]
 * blob = binary large object
 * "not all blobs are created equal" some blobs could be understandable by Varnish, and Varnish can look inside to find executables, and can replace it with another blob.
 * "edge-side includes"
 * cacheing all has expiration dates
 * AJAX--way of using JS to make pages more dynamic (now minus xml)
 * if you do this on client side with AJAX there will be *two* server calls but not for
 * server(center)-edge(customerish)
 * backend-cacheing infrastructure/front server/"edge cacheing"-client
 * 2 hard problems in CS: cache invalidation, naming things, and off-by-1 errors.
 * cacheing layers
 * memcached/memcache(d) <---came from LJ and
 * key-value store < file structure
 * don't put things there that you permanently want in your life: put them in a "store" <connotation of permanence
 * memcached evicts things, you can't assume that things stay there. (can say: has a life of 12 h)
 * put enough stuff in there that there's a high cache-hit ratio.
 * so, redoing API:
 * on the one hand, simple and cacheable.
 * otoh, backward compatibility.
 * breakaway path: one new one blobbable/cacheable/content-focused
 * actually use http error codes.
 * current API: bunch of minor bugs, inconveniences, verbosity. Cleaning up is tedious and high-risk/low-reward.
 * library designed from ground-up to take advantage of SQL-ish API structure
 * Yuri: does python C# php
 * Python: requests to minimize code/readability. Python 3-compatible.
 * old vs. new mediawikis--developing to old is inefficient, developing to new means you have to "fake it on the client" if the server is not supported"
 * framework = client library (like pywikibot)
 * should use the NEW continue method

28 May 2014
Things I want to do today:
 * Finish making slides.
 * Write an "about APIs" introductory post (Done! http://franceshocutt.com/2014/05/28/a-beginners-definition-of-web-api/)
 * Outline a "resources for the MediaWiki API" post
 * See if there's anything else to do; start evaluating libraries?
 * Practice talk
 * Do these kindly.

27 May 2014

 * Wikimedia technical search: https://www.google.com/cse/home?cx=010768530259486146519:twowe4zclqy
 * First revision of library gold standard: API talk:Client code
 * /
 * Java package repository: Maven
 * http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html
 * http://news.dice.com/2013/10/03/a-walk-through-the-java-ecosystem-082/
 * Rust language's conduct policy
 * on bashing your head against problems: http://www.mattringel.com/2013/09/30/you-must-try-and-then-you-must-ask/
 * on Ruby gems: https://github.com/radar/guides/blob/master/gem-development.md

22 May 2014

 * finished initial evaluation of all client libraries
 * Java's still a bit shaky, but wooooo!


 * "Best library" count: 1 Ruby, 4 Python, 1 Perl, 3-4 Java (?), ~3 JavaScript
 * Added links here: https://meta.wikimedia.org/wiki/Research:Resources#Research_Tools:_Statistics.2C_Visualization.2C_etc. and here: http://wikipapers.referata.com/wiki/List_of_tools
 * Went through the rest of the libraries in API:Client code to check for last update date, put that info on the page


 * Offered @edupunkn00b API help when she gets there on her own project; she'll make notes of roadblocks in documentation/learning


 * some links with JS/API resources: User:Waldir


 * went over the gold standard stuff with Mithrandir on IRC: "I think having that gold standard there is good and while I'm picking at many of the individual points, the collective seems well-thought out to me."
 * emailed mentors with progress/request for feedback


 * TODO:
 * Draft slides for API talk
 * Start an outline of resources for API workshop!

21 May 2014

 * updated https://en.wikibooks.org/w/index.php?title=Perlwikibot&stable=0 to remove much obsolete information
 * continuing to evaluate library capabilities (Python)
 * discussed upcoming WikiConference2014 talk with Sumana (IRC)
 * Signed up for a lightning talk on feminist hackerspaces: http://wikiconferenceusa.org/wiki/Lightning_Talks#Sign_up
 * Thinking a lot about this in the context of "open knowledge": http://dawnnafus.files.wordpress.com/2008/09/patches-revised2.pdf (via @betsythemuffin on twitter) [citation: http://nms.sagepub.com/content/early/2011/11/09/1461444811422887]
 * Cleaned up this a bit: Creating_a_bot (Perl and Ruby sections, Python and Java looked fine)
 * Links from @edupunkn00b on REST: http://skillcrush.com/2012/07/13/rest/, http://www.infoq.com/articles/rest-introduction
 * More than you ever wanted to know about git workflow possibilities: https://www.kernel.org/pub/software/scm/git/docs/user-manual.html#the-workflow
 * TODO:
 * Email Tollef/Brad/Merljin tonight with what I have and some questions about JS/Java and then what developers want in a library
 * Start writing up criteria/"gold standard", considering a few different developer-users (novice to expert?)

20 May 2014

 * meeting
 * Evaluating and Improving MediaWiki web API client libraries/Progress Reports
 * Today: evaluate library capabilities, make notes on API talk:Client code
 * Finished Ruby, Perl, Java, part of Python notes
 * Installed Eclipse IDE for Java
 * Asked advice about Java conventions/language structure (classes, all classes, forever classes)
 * https://github.com/mwclient/mwclient/issues/39 filed small doc bug on mwclient
 * consider editing Creating_a_bot

19 May 2014
Today I officially start my OPW internship!


 * Things To Do:
 * put Evaluating_and_Improving_MediaWiki_web_API_client_libraries/Status_updates/Search_results into API:Client Code
 * begin evaluating libraries against the following criteria:
 * Has it been updated in the last 12 mo?
 * Does it have a lot of open bugs/pull requests, especially compared to the number closed?
 * Does it have documentation, code samples, and tests provided?
 * does it, at the minimum, handle logins/cookies/continuations? (even "syntactic sugar" libraries should do these things)


 * Results, resources, misc from an IRC meeting with Sumana, Tollef, et al.:
 * Reminder that github graphs exist, like: https://github.com/dreamwidth/dw-free/graphs/contributors
 * Data for Wikimedia traffic:
 * https://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
 * http://stats.wikimedia.org/wikimedia/squids/SquidReportCrawlers.htm
 * Breaking changes to the API (and therefore a timeline of changes that API client library developers should have taken note of) are (very much should be) mentioned in the release notes Release_notes/1.22, on http://lists.wikimedia.org/pipermail/mediawiki-api-announce/, and in HISTORY https://git.wikimedia.org/blob/mediawiki%2Fcore.git/master/HISTORY.
 * the support-matrix on Wikia (http://api.wikia.com/wiki/Client_libraries#Notes) was last updated in 2011 (http://api.wikia.com/wiki/Client_libraries?action=history), which I believe is after Wikidata was started
 * Python's requests library handles cookies: http://docs.python-requests.org/en/latest/user/quickstart/#cookies
 * IRC:
 * http://en.flossmanuals.net/GSoCStudentGuide/ch014_communication-best-practices/
 * pastebin for sharing multiline code/error/results things: http://tools.wmflabs.org/paste/
 * http://www.harihareswara.net/sumana/2014/02/26/0
 * Wikimedia mailing lists: Mailing_lists/Overview
 * commentary on localization: http://aharoni.wordpress.com/2011/08/24/the-software-localization-paradox/ (came up when discussing the [lack of] API localization)
 * from commentary on pywikibot, Python 2 vs. 3 as another slow deprecation process:
 * http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html
 * https://wiki.python.org/moin/Python2orPython3
 * http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#why-is-python-3-considered-a-better-language-to-teach-beginning-programmers
 * http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#slow-uptake

Explanation of various JavaScript variants: http://organicdonut.com/?p=479

2 April 2014
Currently reading: http://aosabook.org/en/index.html.

Evaluating and Improving MediaWiki web API client libraries/Status updates/Search results

http://wikiconferenceusa.org/wiki/Submissions:Using_web_API_client_libraries_to_play_with_and_learn_from_our_%28meta%29data

http://notabilia.net/

http://journal.code4lib.org/articles/8962

http://blog.hatnote.com/

http://seealso.hatnote.com/

Wikimedia research hub: m:Research:Resources. List of tools: http://wikipapers.referata.com/wiki/List_of_cross-platform_tools.

12-19 March 2014

 * Starting out:
 * Learned what APIs are and discussed with Sumana what the point of an API library is
 * Ideally, it provides affordances that lets you access the deeper wiki structure in an intuitive and functional manner
 * Asked around for well documented APIs that other people have suggested
 * Ruby/S3 SDK
 * Google Drive
 * Google Android
 * Mailchimp
 * Looked at the code and the documentation for the Python libraries listed on API:Client Code
 * Noticed that some of the libraries created layers of abstraction around the MediaWiki API, and others were very simple wrappers over the MediaWiki API
 * Compared the three simple libraries on whether they are maintained, documentation quality, and whether the library includes unit tests. early revision
 * Attempted to start testing the simplemediawiki library...
 * ...but flailed very hard at setting up my tools for it. My portable computer only has Windows working on it right now, so, lessons learned:
 * I already had Python 2.7 installed, but it turned out that I didn't have a package manager. It additionally turned out that pip is ironically difficult to install on Windows.
 * I tried installing setup_tools with the installer it came with and then installing pip with setup_tools. When I then tried to use pip to install the simplemediawiki library I got error messages referencing "egg_info failed", usually associated with a bad package installer.
 * A recommended Windows .tar.gz unzipper is http://www.7-zip.org/. Note that you have to run it twice, once for the .tar and once for the .gz. This is apparently a.
 * setup.py to install setup_tools, setup_tools to install pip, pip to install simplemediawiki, mwclient, and requests. Success!
 * Writing test scripts
 * Started trying to use simplemediawiki to make API calls, initially trying those suggested in the API sandbox.
 * Problems along the way:
 * Figured out that the call function was very close to the actual API calls. I wasn't totally clear that 'action' wasn't to be replaced by e.g. 'wbsearchentities', but once I looked at the API documentation I could see that the same arguments that the API normally took were simply passed in as a dict)
 * Figured out not to try Wikidata API calls with the Mediawiki page!
 * Tested queries of various sorts, including ones that returned data on missing pages
 * See representative tests here, with their results
 * API calls with get seemed to be working ok, so I started testing page-editing capabilities
 * Created an account for User:fhocutt bot
 * Tokens were confusing (remembering python syntax helps, they're not fetched as json, also see: http://stackoverflow.com/questions/17730144/getting-a-python-error-attributeerror-dict-object-has-no-attribute-read-t)
 * The documentation on tokens and bots was somewhat helpful: Manual:Edit token, API:Tokens, User-Agent policy, Bot_policy
 * but: http://www.mediawiki.org/w/api.php?action=tokens and http://www.mediawiki.org/w/api.php?action=tokens&type=edit both give me empty string for tokens and I can't get sandbox API calls with &action=edit to work because I don't have a token. Trying to use the ones that the script gives User:fhocutt bot yields a badtoken error.
 * I got tokens and 'edit' working with simplemedialibrary! See this pastebin and API:Client Code/Access Library Comparison for details.
 * Conclusion based on current work:
 * Simplemediawiki makes it easy to make calls pretty directly to the API interface in a simple python bot. If I pass it the arguments it expects, it works so far.
 * To do: haven't tested any post calls besides edit so I don't know if login/cookies/tokens work with those.


 * Started mwclient tests
 * Once installed (also fine once I had pip), I looked at the documentation and pretty easily got it working for get calls (though you have to take care with capitalization or you get errors similar to this); having the variable names in the sample code distinct from the methods available would help users new to Python avoid this. (See: https://wiki.python.org/moin/BeginnerErrorsWithPythonProgramming.)
 * See API:Client Code/Access Library Comparison for details

Resources

 * MediaWiki collaboration tools
 * Wikimedia pastebin
 * Example, shared on IRC with Sumana: https://tools.wmflabs.org/paste/view/1394197e
 * MediaWiki code
 * Bugzilla list of open API bugs
 * Using this search page and searching for "API" yielded no results, but using the search textbox at the upper right corner does
 * Submit a bug


 * Learning styles resources for engineers/scientists
 * Learning styles as used at Hacker School
 * I love that Mel addresses the "but I don't fit into either of these options!" objection, because I thought precisely that at several points on the quiz
 * Quiz to figure your own out
 * my results and reflections on them
 * Description of 4 learning-style spectra


 * MediaWiki API resources
 * Special:APISandbox not Special:API Sandbox
 * API:Client code
 * Project:Sandbox
 * API
 * API:Tutorial
 * the Wikidata API sandbox
 * Extension:Wikibase/API


 * Other MediaWiki resources
 * Manual:Coding conventions/Python


 * Other API resources
 * Google, Ruby, S3 APIs
 * Ch. 1-2 of RESTful Web Services
 * Beginner's guide for journalists who want to understand API documentation Short guide to the idea of APIs and usual documentation, assumes no previous experience with them


 * Test pages/wikis, ok to use for trial edits
 * https://test.wikipedia.org/wiki/Main_Page
 * https://test2.wikipedia.org/wiki/Main_Page
 * Project:Sandbox
 * Not on my bot's talk page...