User:Martyav/Apps/Tutorial

This is a tutorial for building a simple web app that uses the MediaWiki Action API to access data from Wikipedia. Specifically, the app displays the current Wikipedia Picture of the day. If you are confused by the terminology used in this tutorial, consult our glossary for basic definitions.

Programming languages
This app uses a web framework called Flask. It allows us to write our Action API requests in Python.

There is also some template code, written in HTML, for automatically generating a web page based on the results from our request. There is an accompanying CSS file for styling the web page.

Finally, there is some Javascript code handling user interactions, such as sending the GET request once the user clicks a button, and dynamically updating the results.

Setting up Python
This tutorial uses Python 3.

You can download the latest Python version from here:


 * Python for Windows 7, 8, and 10
 * Python for Mac OS X

If your operating system is Windows XP, a Linux distro, or something else, see the Python beginner's guide for further instructions on installation.

Installing Python 3 shouldn't overwrite or remove any Python 2 installation you may currently have, and you'll still be able to run older software that depends on Python 2. In most cases, having multiple versions of Python won't cause any conflicts, though you may need to specify the version you want when running scripts.

See Can I install Python 3.x and 2.x on the same Windows computer, Switch between python 2.7 and python 3.5 on Mac OS X, and Install and run Python 3 at the same time as Python 2 (Ubuntu Linux) for your respective operating system.

Setting up Flask
There is a tool called pip, which should have come with your Python installation. You can check by typing  into your command line interface. Pip lets us easily download and install the Python libraries and frameworks we need.

If you don't have it already, see Pip installation on the official website.

With Pip installed, all you need to do is type  into your command line to get the latest version of Flask.

Hello world in Flask
Create a new directory for your app, and name it whatever you like. If you visit the Github repo, the final version of this app is named Picture of the day viewer.

Open your new directory and create a new file to contain the main code for the app. We'll name ours app.py.

To this new file, add the following:

Save the file, and type  into your command line application. This will start a local server, running the code you've just written. You can view the web page in your browser by visiting http://localhost:5000

If Flask is successfully installed, you should see the words "Hello world!" displayed in the default font, on a mostly blank web page:

Making the API query
Now that we have everything set up and know Flask is working, we can start writing our code for hitting the Action API.

Import Requests
The Action API works by sending back data in response to a HTTP request. Since making HTTP requests is so important to the key functionality of our app, we should import the Python Requests library, as it will make our lives easier.

Add the following line to the top of your file:

If you don't already have the library installed, make sure to run the following command in your command line application:

The basic GET request structure
Now, remove the old function,, and add this in its place:

The call to session.get adds  and   from parameters to the end of the endpoint string. This creates a query that looks like this:

session.get sends the query immediately. If it succeeds, we get back the data we wanted, inside response. If it fails, we usually get an error message within the response, though in some cases, it may fail silently -- for example, if we try to do something without having the proper permissions.

Functions making GET requests to the Action API have a fairly regular structure. This code is only missing one thing: the particular API we are trying to hit.

Modules
The Action API is made up of numerous smaller APIs, or modules; we will use the two terms interchangeably for the remainder of this tutorial.

For GET requests within the Action API, these smaller APIs generally fall into two broad categories: props and lists.

Props & lists
To make a GET request hit the particular API you want to hit, add  or   to PARAMETERS.

In many cases, prop modules are used for accessing data within a single page, while list modules are used for accessing data across an entire wiki; however, there is overlap. The main difference is how they structure the data they return, within the response.

Prop modules return data nested within an object in the the  element of the response, while list modules return data directly inside the   element.

For developers, all this means is that you need to access a different place within the JSON to get at the data: either  for properties, or   for lists.

Limits
Both props and lists have limits on the amount of data they can return in one go. The maximum number of items allowed in the response at one time depends on your account. Bots and sysops have a maximum limit of 5000, while ordinary users have a maximum limit of 500.

The default limit for most modules is 10 items. You can see more data by making the same query, just with the keyword  appended.

Example: Random, a list module
One of the simplest modules to work with is API:Random. Its response is fairly straightforward. Unlike other modules, it returns a single item by default: a randomly selected wiki page.

To hit API:Random, go back inside the function, action_api, and add  to PARAMETERS:

Now run app.py.

You should see something like this on http://localhost:5000:

Understanding the response
The data we requested is within a key which has the same name as the module -- in this case,. The data is packaged as an array of objects.

Modules that return pages don't directly return an address to the page; by default, they generally just return an id and title. The title is the current name for the page, while the id is a unique identifier which stays the same across moves, edits, and re-names. We can use either one to build a URI to access the page.

The structure for this URI is the base address for the wiki, then, then the title of the page. So, for our response, the final address to the page would look like this: https://en.wikipedia.org/wiki/Mallabhum_Institute_of_Technology

Modules that return pages also include one additional bit of information -- an ns value. Ns stands for namespace. Namespaces are how MediaWiki broadly classifies pages. Articles, discussion pages, and help pages all belong to different namespaces.

Each namespaces has a numeric code to identify it. The main namespace, where wiki articles live, is namespace 0. Discussion pages are namespace 1, and help pages are namespace 12.

Example: Images, a prop module
Since we're trying to build an app to view the Wikipedia Picture of the Day, we need to use API:Images.

API:Images is a little more complicated than API:Random. It requires passing some more parameters in the query, and it returns multiple items at once.

In addition, unlike API:Random, which is a list, API:Images is a prop, so the response from it looks a little different.

Finally, like most property modules, API:Images relates to the properties of a page: namely, the images embedded in it. We could specify multiple pages, but for this example, we'll only query one.

To hit API:Images, go back inside the function, action_api, and remove  from PARAMETERS.

Add  where   once was. Then, underneath this line, add, like so:

Now run app.py again. You should see something like this:

Understanding the response
Props return data nested within an element corresponding to the page id. Getting at the data inside  without knowing the page id ahead of time can be tricky.

One way around it is to specify the version format of our JSON:

JSON version 2 returns  as an array, which is easily indexed into.

Another way to work with  without explicitly knowing page ids ahead of time, is to iterate into it, using Python's built-in next function.

Either way, inside this object, we have individual items listed off with their namespace and title. Each item represents one image. The title, in this case, is the file name of the image.

Again, as with API:Random, we only get the title of the image, not the image's address. However, unlike API:Random, it is not simply a matter of appending the file name to a base address to get the right URI for the image. If we want the address to display an image, we need to use an additional API -- API:Imageinfo.

Example: Image info, with module parameters
Because image addresses are more complicated than page addresses, we need to hit API:Imageinfo to gain access to where the image is hosted.

All modules have special parameters associated with them, which alter what kind of data is returned. For example, API:Imageinfo has. By default, API:Imageinfo just returns the timestamp and username associated with the last modification of the image. Adding  to our query will also retrieve the image's url.

Let's go back to to PARAMETERS and update it so we're now hitting API:Imageinfo.

Run app.py, and you should see this:

Understanding the response
Here, within the  element, we see that the id key is , because this image lacks a page id. Some images, such as, do have unique page id's, but this one does not, perhaps because it is in a shared repository on Wikimedia Commons, instead of directly uploaded to Wikipedia.

The  returned contains three URI's: ,  , and. The first is the one that we want. It's a direct link to the image itself. The second and third ones link to pages containing some meta-data about the image, such as a brief description, the user who originally uploaded it, and so on.

We now have the information we need to build a Flask app displaying the Wikipedia Picture of the Day.

Displaying the page
We should now have a file containing code that looks like this:

We have our code for hitting an API, but it's not quite a full app yet. When we run it, all it does is display some raw JSON. Our code also isn't flexible enough yet to correctly query the Wikipedia Picture of the Day. We'll have to combine several API calls to display the picture. Finally, we can only make a request when we open or refresh the page.

We're going to have to add a few more files, and import some things from Flask to get it all working together.

Adding the HTML file
Let's create a new file. We'll name it index.html. Add some basic HTML 5 boilerplate to it, a few elements, and save it in the same directory as app.py.

Rendering the page
If you were to run app.py at this point, you wouldn't see anything different. The app doesn't know about this file yet. We need to connect the two files.

At the top of app.py, alter the first line to also import render_template: Inside of action_api, replace  with

Now the app knows about the web page -- but it returns an error?

Template directory
Flask expects to see a directory named templates in our app. If it's not there, it won't know what to do when we call render_template.

Templates is where we put files which contain some dynamic elements. In our case, we place index.html in there because we'll be dynamically updating it to display the results from our API call.

Although Flask uses the Jinja framework to render templates, plain HTML will also work fine. We leave re-writing the page in Jinja syntax as an exercise for the reader.

After creating a templates directory, putting index.html inside, and getting render_templates to run, we now see a mostly blank page, displaying Picture of the day: in big h1 letters.

Static directory
We have a web page. We have an API call. Both are working...but only one or the other is being displayed. How do we update the web page to display the API call?

The Python in app.py can't do so directly. We need to create a script file, to hold some Javascript that will update the page for us. Flask uses a directory named static to contain any helper files that stay the same throughout the lifecycle of the app. We'll be adding our Javascript file there. We'll also add a CSS file to style the page there, later.

Adding the Javascript
Inside the static directory, create a new file and name it what you like. Ours is named script.js.

Now, go back to the Javascript file. For the purposes of this tutorial, we'll eschew any Javascript libraries or frameworks, such as React or Jquery. However, we will be using some ES6 features -- namely,. If you are using a very old browser, such as Internet Explorer 10, change those to.

The first line in our script file will be a method wrapping around the rest of the code, ensuring that the script doesn't run until after the page has loaded. The way to do this in vanilla Javascript is like so:

Inside this method, create a variable to target the potd element on the page. This will be where we display the picture.

Underneath, create another variable to hold on to a new XMLHttpRequest. This variable will help us communicate with our Python code. Once the page loads, it will fire off a POST request against our page. This will serve as a signal for app.py. When the app makes its GET request and retrieves data, the variable will also allow us to know if our request has finished, and if it succeeded or failed. Upon a sucessful request, it will give us access to the JSON from the call:

This code includes some magic numbers:  and. Readystate 4 means a request has completed, while status code 200 indicates that it was a success.

We will now access the data in the response, and update the page by referencing the  property on the potd element:

One last thing: we need to update app.py to respond to the Javascript POST request.

Connecting app.py with our Javascript
Back inside app.py, let's import Flask's request object, as well as jsonify, a tool that will ensure that our response is passed as JSON instead of a string:

Now, add POST requests to our recognized routes:

Let's create a new function inside app.py, specifically for rendering the page and responding to changes:

Let's also alter action_api. We'll make it specifically retrieve the image's address, and return it as valid JSON:

We now have the Python and the Javascript communicating together correctly.

Adding the CSS
We're almost ready to render the page.

Create a new file in the static directory, and name it what you like. Ours is called style.css. We'll be adding some colors and visual motifs based on the Wikimedia Style Guide.

First, the basic page elements:

Now, some styling for a new CSS class wrapping around our picture. This will make the app look more polished:

Connecting index.html with our Javascript and CSS
Finally, we will need to reference the stylesheet and script on our web page before any of the magic happens. Update the head element with references to our files, and add a new div referencing the new CSS class:

Understanding the app lifecycle
We now have four files:

app.py

/static/script.js

/templates/index.html

/static/style.css

When we run app.py and render the page, the Javascript file starts up. It waits to fire until the page is fully loaded. The first thing it fires off is a POST request, aimed at the page itself.

App.py is waiting for this POST request. It indicates that the page is now loaded and ready to accept data. App.py fires off a GET request to the Action API. It returns the data in JSON format.

The Javascript is still active and monitoring the request. It sees that the request is done. It also sees that the request succeeded. It parses whatever is inside the response from the request. This currently contains the JSON we just packaged in app.py.

The Javascript now takes the JSON, retrieves the data, adds HTML tags around it, and appends it to the potd div on our page.

Until now the HTML file has just been displaying whatever we initially wrote into it. Now, with the changes to the potd div, it updates and re-renders the page with our new tags and styling.

Picture of the day viewer
The final sections of this tutorial relate how to make our app work with Wikipedia:Picture of the day.

Picture of the day, or POTD, is a featured image displayed on the home page of Wikipedia.

We'll be hitting an endpoint containing a wiki template that changes every day, and using the data we find there to get at the image and some descriptive text about it.

Getting today's date
The first order of business is simply knowing what day it is.

Because POTD updates daily, we need today's date to access the archives and get at a stable version of the correct picture.

We're going to import another native Python library.

Go into app.py, and add this line near the top of the file:

Then, in index, add a variable containing the current date:

We generate a string from date.today, because it actually returns a datetime object.

Todays_date is in the format YYYY-mm-dd, which thankfully is the same exact format that dates are listed within the Wikipedia POTD archives.

Refactoring app.py
We're now going to re-arrange a few things inside app.py.

To get all the data we need, we'll be making several Action API calls, so having a single function named action_api doesn't make sense anymore.

First, pull out the variables session and endpoint from action_api. Add them underneath. They're constants in the module scope, so change their variable names to be in all-caps, as described in the Python style guide, Pep8:

Next, define a new function, fetch_potd(date):

We access the protected POTD page to get at the most stable version of the image in the archives. This gives us the information we need to reach the image. However, we don't have the address to the image quite yet.

Define another function, named fetch_image_url:

Update fetch_potd to include a call to this function:

Finally, alter index to call fetch_potd:

Displaying title and date
We're trying to display some slightly different information now, so we need to change our HTML. We want to show the image, its file name with a link back to where it is hosted, as well as the current date.

Most of the information relating to the image should stay on the card div, and the relevant elements can be added dynamically, via the script. However, we also have a date to display. That should go in its own section, in case we want to include controls for changing the date.

Underneath our card div, add a new div to contain our date:

Now, inside our CSS file, add some styling for the new things we plan to have on the card...

...and for the date divs:

Updating our Javascript to handle the new data
Since we're now sending different JSON out from app.py, we're also going to have to alter our Javascript.

Near the top of our file, add a new variable to hold on to a reference to our new date div:

We'll be adding data to multiple page elements this time, so define a helper function for constructing the HTML:

Now, go inside the call to xmlHttp.onreadystatechange. Add these variables to contain the data from our JSON:

We have to add tags to the card data before adding it to the page:

Lastly, append the data to the page:

Final notes
We now have a complete web app for displaying Picture of the Day!

Postscript 1: Adding a description
We have the filename and image for the daily POTD, but how do we get at the nice descriptive text accompanying them on the Wikipedia page?

This is actually trickier than it seems at first glance.

The core modules of the Action API don't provide a plaintext version of article or template content. The response you get always needs to be parsed, and may contain wiki markup or HTML tags.

If you're not up to parsing text from the latest edit, via API:Revisions, there are two ways of getting around this.

One is to use API:Search to grab a snippet off the page. This snippet is always parsed already. Unfortunately, when it comes to POTD, the text in the snippet isn't always high quality -- if it's even there at all. However, since snippets are part of the core functionality of the Action API, you don't have to worry about incompatibilities, and the majority of snippets are adequate as preview text.

The code for accessing the snippet would look like this:

Another way to get at the description depends on the particular wiki you're trying to hit.

Wikis that have the CirrusSearch extension to API:Search allow you to access the text of an article via  appended to the end of a page url, like so:.

Note that, while most of the markup is stripped out, some escape characters are included.

The results from cirrusdump are of better and more consistent quality than from API:Search snippets, but not all wikis have the extension installed. Using this to access the description for POTD is left as an exercise for the reader.

Postscript 2: User interaction
We can see the picture for the current date, but what if we want to change it?

Adding user interaction to our app

Sample app
A complete version of this app is available on Github, at ___.