User:Martyav/Apps/Tutorial

This tutorial will teach you how to use the Mediawiki Action API to build a web app in Flask, a Python framework. Specifically, the app will display the current Wikipedia Picture of the day.

A complete version of the app is available online: Download the code from Github

Although this tutorial provides examples walking you through most of the code, it is good to have knowledge of the following items before you begin:


 * Python 3
 * Flask
 * Jinja
 * HTML
 * CSS
 * JSON

Setting up Python
This tutorial uses Python 3. You can download the latest Python version from here:


 * Python for Windows 7, 8, and 10
 * Python for Mac OS X

If your operating system is Windows XP, a Linux distro, or something else, see the Python beginner's guide for further instructions on installation.

Setting up Flask
Pip is a package manager that should have come with your Python installation. If you don't have it already, install it from the official Pip website. Once you've got it, open your command line interface of choice and run

Hello world in Flask
If you have everything successfully installed, the following script should display "Hello world" inside your web browser, at http://localhost:5000/:

App.py

Making an API request
Now that we have everything set up and know Flask is working, we can start writing our code for hitting the Action API.

Import Requests
The Action API works by sending back data in response to a HTTP request. We should import the Python Requests library, as it will make our lives easier. Add the following line to the top of your file:

If you don't already have this library installed, make sure to run the following command:. This also applies to any other Python libraries mentioned in the rest of the tutorial.

The basic GET request structure
Go back to app.py. Remove the old function,, and add this in its place:

App.py

On line 11 of this code snippet,  formats and appends the values inside   to the endpoint string. This creates a query that looks like this:

sends the query immediately. If it succeeds, we get back the data we wanted, inside. If it fails, we usually get an error message within the response, though in some cases, it may fail silently -- for example, if we try to do something without having the proper permissions.

Functions making GET requests to the Action API have a fairly regular structure. This code is only missing one thing: the particular API we are trying to hit.

Modules
The Action API is made up of numerous smaller APIs, or modules; we will use the two terms interchangeably for the remainder of this tutorial. For GET requests within the Action API, these smaller APIs generally fall into two broad categories: props and lists.

Random, a list module
One of the simplest modules to work with is API:Random. Its response is fairly straightforward. Unlike other modules, it returns a single item by default: a randomly selected wiki page.

To hit API:Random, go back inside the function,, and add   to parameters:

App.py

To display the results on the web page, import the Python Json library alongside requests:

Also, inside, change the last line:

Now run app.py. You should see something like this on http://localhost:5000:

Response

The data we requested is within a key which has the same name as the module -- in this case,. The data is packaged as an array of objects.

Modules that return data about pages don't directly return an address to the page; by default, they generally just return an  and. The  is the current name for the page, while the   is a unique identifier which stays the same across moves, edits, and re-names. We can use either one to build an address to access the page. The structure for this address is the base address for the wiki, then, then the title or id of the page.

For the response in our example, the final address to the page would look like this: https://en.wikipedia.org/wiki/Mallabhum_Institute_of_Technology

Modules that return data about pages also include an additional bit of information -- an  value. stands for namespace.

Namespaces are how MediaWiki broadly classifies pages. Articles, discussion pages, and help pages all belong to different namespaces. Each namespaces has a numeric code to identify it. The main namespace, where wiki articles are hosted, is namespace 0. Discussion pages are namespace 1, and help pages are namespace 12.

Images, a prop module
Since we're trying to build an app to view the Wikipedia Picture of the Day, we need to use API:Images. API:Images requires passing some more parameters in the query, and it returns multiple items at once, instead of just one. In addition, unlike API:Random, which is a list, API:Images is a prop, so the response looks a little different. Finally, like most property modules, API:Images relates to the properties of a page: specifically, the images embedded in it.

To hit API:Images, go back inside, and remove   from.

Add  where   once was. Then, underneath this line, add, like so:

App.py

Now run app.py again. You should see something like this:

Response

Props return data nested within the  key, inside another key corresponding to the page id.

Getting at the data inside  without knowing the   ahead of time can be tricky. One way around it is to specify the version format of our JSON:

App.py

JSON version 2 returns  as an array, which is easily indexed into.

Another way to work with  without explicitly knowing page ids ahead of time, is to iterate into it, using Python's built-in next function.

Either way, inside this object, we have individual items listed off with their namespace and title. Each item represents one image. The title, in this case, is the file name of the image.

Again, as with API:Random, we only get the title of the image, not the image's address. However, unlike API:Random, it is not simply a matter of appending the file name to a base address to get the right url for the image. If we want the address to display an image, we need to use an additional API -- API:Imageinfo.

Image info, a prop module with module parameters
Because image addresses are more complicated than page addresses, we need to hit API:Imageinfo to get the correct image url.

All modules have special parameters associated with them, which alter what kind of data is returned. For example, API:Imageinfo has. By default, API:Imageinfo just returns the timestamp and username associated with the last modification of the image. Adding  to our query will also retrieve the image's url.

Let's go back to to  and update it so we're now hitting API:Imageinfo:

App.py

Run app.py, and you should see this:

Response

Here, within the  element, we see that the id key is , because this image lacks a page id. Some images, such as, do have unique page id's, but this one does not, perhaps because it is in a shared repository on Wikimedia Commons, instead of directly uploaded to Wikipedia.

The  returned contains three distinct urls: ,  , and. The first is the one that we want. It's a direct link to the image itself. The second and third ones link to pages containing some meta-data about the image, such as a brief description, the user who originally uploaded it, and so on.

Picture of the day viewer
Picture of the day, or POTD, is a featured image displayed on the home page of Wikipedia. We'll be hitting an endpoint containing a wiki template that changes every day, and using the data we find there to get at the image and some further information about it.

Getting today's date
The first order of business is simply knowing what day it is. Because POTD updates daily, we need today's date to access the archives and get at a stable version of the correct picture. We're going to import the class.

Create a file, and name it app.py. Go into, and add this line near the top of the file:

Underneath, define a function, named index. We'll be using this function to render the web page soon.

Inside, add a variable containing the current date in a formatted string:

The call to  gives us a date string in the format YYYY-mm-dd, which is exactly the format of all dates listed within the Wikipedia POTD archives.

App.py

Adding Flask to app.py
Import the following from the  library: ,  , and, of course,.

Look back at the  function in. It contains some additional boilerplate that we need to get our app running.

allows us to call our app later and run it. tells our app which route to listen to, and what functions are associated with that route. triggers the app to fire up a local server, which we can use to pass data or render a web page.

After you add the code from  to , we'll be ready to make our API calls and do something with them.

App.py

Making the API calls
The Action API works by sending back data in response to a HTTP request. We should import the Python Requests library, as it will make our lives easier.

Add the following line to the top of your file:

Despite the name, this library is distinct from our earlier import,, which is simply a class that allows Flask to communicate back and forth with the web page it is serving.

Define a new function,.

We access the protected Picture of the Day page to get at the most stable version of the image in the archives. This gives us the information we need to reach the image. However, we don't have the address to the image quite yet.

Define another function, named.

Update  to include a call to this function.

Finally, alter  to call. Make  return. We'll be discussing the meaning of this call in the next section.

App.py

Displaying the page
The call to  generates markup for a web page that contains the information passed along in. Flask uses a directory named templates to hold files that contain some dynamic elements.

Template directory
Create a new directory, and name it templates. Add a new file to it, and name it index.html.

Flask templates mostly contain HTML markup, but they also use Jinja to render the dynamic elements. Jinja markup looks like this --  -- and is used to inject Python variables or expressions into our page.

Add some basic HTML 5 boilerplate to index.html, and a few elements. Amongst our HTML scaffold is Jinja syntax indicating where the data from app.py will go:

templates/index.html

Static directory
Flask uses another directory, named static, to contain any helper files that stay the same throughout the lifecycle of the app. On the same level as  and , create a new directory, and name it static. Inside this directory, create a new file, and name it style.css.

We'll be using some colors and visual motifs based on the Wikimedia Style Guide.

static/style.css

Postscript 1: Adding a description
We have the filename and image for the daily POTD, but how do we get at the nice descriptive text accompanying it on the Wikipedia page? This is actually trickier than it seems. The core modules of the Action API don't provide a plaintext version of article or template content. The response always contains markup and needs to be parsed to remove it. If you're not up to parsing text from the latest edit, via API:Revisions, there are two ways of getting around this.

Snippets
API:Search can be used to grab a brief text snippet off the page. This snippet is always parsed. Unfortunately, when it comes to Picture of the Day, the text in the snippet may look a little strange -- for example, it may begin in the middle of a sentence. However, since snippets are part of the core functionality of the Action API, you don't have to worry about KeyErrors when attempting to access the text.

The code for accessing the snippet would look like this:

App.py

CirrusSearch
If a wiki has the CirrusSearch extension to API:Search, you can access the text of an article via. A query to do so would look like this:.

The results are of better and more consistent quality than from API:Search snippets, but not all wikis have the CirrusSearch extension installed.

Postscript 2: User interaction
What if we want to view pictures other than the featured picture for today? This information is actually pretty easy to access -- Wikipedia keeps all the Pictures of the Day in an archive, with images dating back to 14 May, 2004. In order to add this feature to the web app, though, we need to change a few things:


 * 1) Add controls to our app, so users can change the date for themselves
 * 2) Style the new controls
 * 3) Alter app.py, to respond to what the user is doing on the page
 * 4) Handle dates that are out of range

Adding controls
Go back to index.html. Inside the date container, add another div, for our new control scheme. We're giving ours the class, "date-picker," so we can style it later:

Index.html

Recall that app.py works by responding to GET and POST requests. While we could set up buttons with event handlers to send off POST requests, it's simpler to just use an HTML form, and treat our controls as form inputs, since the form element is designed specifically to send POST requests:

Index.html

Here, our inputs are submit buttons, with two different values, ← Back and Next →. When the form is submitted, the selected value will be passed back to app.py.

Styling the controls
Since we've added new HTML elements, we need to style them.

Go to your css file, and add the following:

style.css

Altering app.py
The first thing we need to do is pull out the date variable from index, so that it is accessible to the rest of the file, and can be updated by other functions. Create a new variable in the modular scope, and put it under our constants:

App.py

We want to go backward or forward in time, so we also need to accurately add or subtract from current_date. Python's  allows us to do just that. Add it to our datetime imports:

Define functions for incrementing and decrementing  with  :

App.py

Now, our  function. If the next date is beyond the valid date range, it should not return anything:

App.py

Routing is generally how Flask knows which methods to call in response to events. Update our "/" route to handle POST requests:

If "/" receives a GET or POST request,  is called. We already have a code path for GET --  renders the page. Let's create another code path for the POST from our form:

App.py

Communicating that a date is out of range
On the user side, we should add some information to our input buttons, to communicate that the next date is not available to view.

Jinja allows us to add conditional formatting to our page, so that if the next date is out of range, the button will be disabled. However, from inside index.html, we don't have full access to the datetime class. We need a bit of a hack: because we can't create a separate date object to compare against, we must make a comparison against an existing date and its methods:

Index.html

Since we now need a date object, not a string based on the date, we have to alter the information being passed from app.py:

App.py

Finally, style the inputs to clearly communicate when they can and cannot be clicked to display a new date:

Style.css