User:Martyav/Apps/Tutorial

This is a tutorial for building a simple web app that uses the MediaWiki Action API to access data from Wikipedia. Specifically, the app displays the current Wikipedia Picture of the day.

Sample app
A complete version of the Picture of the Day app is available on Github, at MediaWiki Action API Code Samples.

Programming languages & frameworks
It is good to have knowledge of the following items, though this tutorial provides snippets walking you through most of the code:


 * Python 3
 * Flask
 * Jinja
 * HTML
 * CSS
 * JSON

Setting up Python
This tutorial uses Python 3.

You can download the latest Python version from here:


 * Python for Windows 7, 8, and 10
 * Python for Mac OS X

If your operating system is Windows XP, a Linux distro, or something else, see the Python beginner's guide for further instructions on installation.

Setting up Flask
In your command line interface of choice, run

Note: Pip is a package manager that should have come with your Python installation.

If you don't have it already, see Pip installation on the official website.

Hello world in Flask
Create a new directory for your app, and name it whatever you like. If you visit the Github repo, the final version of this app is named Picture of the day viewer.

Open your new directory and create a new file to contain the main code for the app. We'll name ours app.py.

To this file, add the following:

Save the file, and run. This will start a local server, running the code you've just written. You can view the web page in your browser by visiting http://localhost:5000

Making the API request
Now that we have everything set up and know Flask is working, we can start writing our code for hitting the Action API.

Import Requests
The Action API works by sending back data in response to a HTTP request. Since making HTTP requests is so important to the key functionality of our app, we should import the Python Requests library, as it will make our lives easier.

Add the following line to the top of your file:

If you don't already have this library installed, make sure to run the following command:

The basic GET request structure
Now, remove the old function,, and add this in its place:

On line 11 of this code snippet,  formats and appends the values inside   to the endpoint string. This creates a query that looks like this:

sends the query immediately. If it succeeds, we get back the data we wanted, inside. If it fails, we usually get an error message within the response, though in some cases, it may fail silently -- for example, if we try to do something without having the proper permissions.

Functions making GET requests to the Action API have a fairly regular structure. This code is only missing one thing: the particular API we are trying to hit.

Modules
The Action API is made up of numerous smaller APIs, or modules; we will use the two terms interchangeably for the remainder of this tutorial.

For GET requests within the Action API, these smaller APIs generally fall into two broad categories: props and lists.

Random, a list module
One of the simplest modules to work with is API:Random. Its response is fairly straightforward. Unlike other modules, it returns a single item by default: a randomly selected wiki page.

To hit API:Random, go back inside the function,, and add   to parameters:

To display the results on the web page, import the Python Json library alongside requests:

Also, inside, change the last line:

Now run app.py. You should see something like this on http://localhost:5000:

The data we requested is within a key which has the same name as the module -- in this case,. The data is packaged as an array of objects.

Modules that return data about pages don't directly return an address to the page; by default, they generally just return an  and. The  is the current name for the page, while the   is a unique identifier which stays the same across moves, edits, and re-names. We can use either one to build an address to access the page. The structure for this address is the base address for the wiki, then, then the title or id of the page.

For the response in our example, the final address to the page would look like this: https://en.wikipedia.org/wiki/Mallabhum_Institute_of_Technology

Modules that return data about pages also include an additional bit of information -- an  value. stands for namespace.

Namespaces are how MediaWiki broadly classifies pages. Articles, discussion pages, and help pages all belong to different namespaces. Each namespaces has a numeric code to identify it. The main namespace, where wiki articles are hosted, is namespace 0. Discussion pages are namespace 1, and help pages are namespace 12.

Images, a prop module
Since we're trying to build an app to view the Wikipedia Picture of the Day, we need to use API:Images. API:Images requires passing some more parameters in the query, and it returns multiple items at once, instead of just one. In addition, unlike API:Random, which is a list, API:Images is a prop, so the response looks a little different. Finally, like most property modules, API:Images relates to the properties of a page: specifically, the images embedded in it.

To hit API:Images, go back inside, and remove   from.

Add  where   once was. Then, underneath this line, add, like so:

Now run app.py again. You should see something like this:

Props return data nested within the  key, inside another key corresponding to the page id.

Getting at the data inside  without knowing the   ahead of time can be tricky. One way around it is to specify the version format of our JSON:

JSON version 2 returns  as an array, which is easily indexed into.

Another way to work with  without explicitly knowing page ids ahead of time, is to iterate into it, using Python's built-in next function.

Either way, inside this object, we have individual items listed off with their namespace and title. Each item represents one image. The title, in this case, is the file name of the image.

Again, as with API:Random, we only get the title of the image, not the image's address. However, unlike API:Random, it is not simply a matter of appending the file name to a base address to get the right url for the image. If we want the address to display an image, we need to use an additional API -- API:Imageinfo.

Image info, a prop module with module parameters
Because image addresses are more complicated than page addresses, we need to hit API:Imageinfo to get the correct image url.

All modules have special parameters associated with them, which alter what kind of data is returned. For example, API:Imageinfo has. By default, API:Imageinfo just returns the timestamp and username associated with the last modification of the image. Adding  to our query will also retrieve the image's url.

Let's go back to to  and update it so we're now hitting API:Imageinfo.

Run app.py, and you should see this:

Here, within the  element, we see that the id key is , because this image lacks a page id. Some images, such as, do have unique page id's, but this one does not, perhaps because it is in a shared repository on Wikimedia Commons, instead of directly uploaded to Wikipedia.

The  returned contains three distinct urls: ,  , and. The first is the one that we want. It's a direct link to the image itself. The second and third ones link to pages containing some meta-data about the image, such as a brief description, the user who originally uploaded it, and so on.

Displaying the page
We have our code for hitting the Action API, but it's not quite a full app yet. When we run it, all it does is display some raw JSON.

Template directory
Create a new directory, named templates. Flask uses templates to hold files that contain some dynamic elements. In our case, we'll be placing the index page of our app here, because we'll be dynamically updating it to display the results from our API calls. Flask templates use Jinja to render the dynamic elements. Jinja markup looks like this --  -- and is used to inject Python variables or expressions into our page.

Static directory
Create another new directory, named static. Flask uses static to contain any helper files that stay the same throughout the lifecycle of the app. We'll add a CSS file to style the page here.

Adding the web page
Let's create a new file inside the templates directory. We'll name it index.html. Add some basic HTML 5 boilerplate to it, and a few elements. Amongst our HTML scaffold is Jinja syntax indicating where the data from app.py will go:

Adding the styling
Create a new file in the static directory, and name it style.css. We'll be using some colors and visual motifs based on the Wikimedia Style Guide.

First, the basic page elements:

Next, some styling for a new CSS class wrapping around our picture. This will make the app look more polished:

Don't forget to reference this new file inside the head of index.html.

Rendering the page
We need to connect index.html to app.py.

At the top of app.py, import the  function alongside the   class:

Create a new function alongside. Since it's rendering the index page, we're naming it.

Now the app knows about the web page, and will pass the response from  as. However, if we were to run app.py now, we'd see all our HTML tags and CSS styling decorating the raw JSON string we're dumping from the response. We need to package our data in an easily accessible way, so the template can generate HTML elements from it.

Go back inside, remove the current return statement, and add these lines in its place:

Inside index.html, update the Jinja:

We are now dynamically rendering the picture and its title from our API call.

Picture of the day viewer
The final sections of this tutorial relate how to make our app work with Wikipedia:Picture of the day. Picture of the day, or POTD, is a featured image displayed on the home page of Wikipedia. We'll be hitting an endpoint containing a wiki template that changes every day, and using the data we find there to get at the image and some descriptive text about it.

Getting today's date
The first order of business is simply knowing what day it is. Because POTD updates daily, we need today's date to access the archives and get at a stable version of the correct picture. We're going to import another native Python library.

Go into app.py, and add this line near the top of the file:

Then, in index, add a variable containing the current date:

The call to  gives us a date string in the format YYYY-mm-dd, which is the exact format that dates are listed within the Wikipedia POTD archives.

Refactoring app.py
We're now going to re-arrange a few things inside app.py.

First, pull out the variables  and   from. Add them underneath. They're now constants in modular scope, so change their variable names to be in all-caps, as described in the Python style guide, Pep8:

Second, to get all the data we need, we'll be making several Action API calls, so having a single function named  doesn't make sense anymore. Delete it and define a new function, :

We access the protected Picture of the Day page to get at the most stable version of the image in the archives. This gives us the information we need to reach the image. However, we don't have the address to the image quite yet.

Define another function, named :

Update  to include a call to this function:

Finally, alter  to call  :

Adding a link and date
We'll be linking back to the Picture of the Day's archived description, and also displaying the current picture's date.

Update the  div inside index.html:

Underneath our  div, add a new div to contain the current date:

Now, inside our CSS file, add some styling for the new things on the card...

Postscript 1: Adding a description
We have the filename and image for the daily POTD, but how do we get at the nice descriptive text accompanying it on the Wikipedia page? This is actually trickier than it seems at first glance. The core modules of the Action API don't provide a plaintext version of article or template content. The response you get always needs to be parsed, and may contain wiki markup or HTML tags. If you're not up to parsing text from the latest edit, via API:Revisions, there are two ways of getting around this.

Snippets
API:Search can be used to grab a brief text snippet off the page. This snippet is always parsed already. Unfortunately, when it comes to Picture of the Day, the text in the snippet isn't always high quality -- if the snippet string itselfisn't empty. For example, the snippet may begin in the middle of a sentence. However, since snippets are part of the core functionality of the Action API, you don't have to worry about KeyErrors, and the majority of snippets are adequate as preview text.

The code for accessing the snippet would look like this:

CirrusSearch
Wikis that have the CirrusSearch extension to API:Search allow you to access the text of an article via. A query to do so would look like this:.

Note that, while most of the markup is stripped out of the text in the API response, some escape characters will be included.

The results are of better and more consistent quality than from API:Search snippets, but not all wikis have CirrusSearch extension installed.

Postscript 2: User interaction
What if we want to browse through the Picture of the Day archives? This information is actually pretty easy to access. Wikipedia keeps all the Pictures of the Day in an archive, with images dating back to 14 May, 2004. If we were simply curious about other dates, all we would need to do to our code is change it to point to any valid POTD date. In order to access different dates via the web app's interface, though, we need to change a few more things:


 * 1) Add controls to our app, so users can change the date for themselves
 * 2) Style the new controls
 * 3) Alter app.py, to respond to what the user is doing on the page
 * 4) Handle dates that are out of range

Adding controls
Go back to index.html. Inside the date container, add another div, for our new control scheme. We're giving ours the class, "date-picker," so we can style it later:

Recall that app.py works by responding to GET and POST requests. While we could set up buttons with event handlers to send off requests, it's simpler to just use an HTML form, and treat our controls as form inputs:

Here, our inputs are submit buttons, with two different values, "← Back" or "Next →". When the form is submitted, the values will be passed back to app.py.

Styling the controls
Since we've added new HTML elements, we need to style them.

Go to your css file, and add the following:

Altering the date
The first thing we need to do is pull out the date variable from index, so that it is accessible to the rest of the file, and can be updated by other functions. Create a new variable in the modular scope, and put it under our constants:

We want to go backward or forward in time, so we also need to accurately add or subtract from current_date. Python's  allows us to do just that. Add it to our datetime imports:

Define functions for incrementing and decrementing  with  :

Now, our  function. If the next date is beyond the valid date range, it should not return anything:

Updating a route for our form
Routing is generally how Flask knows which methods to call in response to events. We'll be updating our "/" route to handle POST requests.

If "/" receives a GET or POST request,  is called. Let's add another code path for the POST:

Communicating that a date is out of range
On the user side, we should add some information to our input buttons, to communicate that the next date is not available to view.

Jinja allows us to add conditional formatting to our page, so that if the next date is out of range, the button will be disabled. However, from inside index.html, we don't have full access to the datetime class. We need a bit of a hack: because we can't create a separate date object to compare against, we must make a comparison against an existing date and its methods:

Since we now need a date object, not a string based on the date, we need to alter the information being passed from app.py:

Finally, style the inputs to clearly communicate when they can and cannot be clicked to display a new date: