User:Martyav/Apps/Tutorial

This is a tutorial for building a simple web app that uses the MediaWiki Action API to access data from Wikipedia. Specifically, the app displays the current Wikipedia Picture of the day. If you are confused by the terminology used in this tutorial, consult our glossary for basic definitions.

Programming languages
This app uses a web framework called Flask. It allows us to write our Action API requests in Python.

There is also some template code, written in HTML, for automatically generating a web page based on the results from our request. There is an accompanying CSS file for styling the web page.

Finally, there is some Javascript code handling user interactions, such as sending the GET request once the user clicks a button, and dynamically updating the results.

Setting up Python
This tutorial uses Python 3.

You can download the latest Python version from here:


 * Python for Windows 7, 8, and 10
 * Python for Mac OS X

If your operating system is Windows XP, a Linux distro, or something else, see the Python beginner's guide for further instructions on installation.

Installing Python 3 shouldn't overwrite or remove any Python 2 installation you may currently have, and you'll still be able to run older software that depends on Python 2. In most cases, having multiple versions of Python won't cause any conflicts, though you may need to specify the version you want when running scripts.

See Can I install Python 3.x and 2.x on the same Windows computer, Switch between python 2.7 and python 3.5 on Mac OS X, and Install and run Python 3 at the same time as Python 2 (Ubuntu Linux) for your respective operating system.

Setting up Flask
There is a tool called pip, which should have come with your Python installation. You can check by typing  into your command line interface. Pip lets us easily download and install the Python libraries and frameworks we need.

If you don't have it already, see Pip installation on the official website.

With Pip installed, all you need to do is type  into your command line to get the latest version of Flask.

Hello world in Flask
Create a new directory for your app, and name it whatever you like. If you visit the Github repo, the final version of this app is named Picture of the day viewer.

Open your new directory and create a new file to contain the main code for the app. We'll name ours app.py.

To this new file, add the following:

Save the file, and type  into your command line application. This will start a local server, running the code you've just written. You can view the web page in your browser by visiting http://localhost:5000

If Flask is successfully installed, you should see the words "Hello world!" displayed in the default font, on a mostly blank web page:

Making the API query
Now that we have everything set up and know Flask is working, we can start writing our code for hitting the Action API.

Import Requests
The Action API works by sending back data in response to a HTTP request. Since making HTTP requests is so important to the key functionality of our app, we should import the Python Requests library, as it will make our lives easier.

Add the following line to the top of your file:

If you don't already have the library installed, make sure to run the following command in your command line application:

The basic GET request structure
Now, remove the old function,, and add this in its place:

The call to SESSION.get adds  and   from PARAMETERS to the end of the ENDPOINT string. This creates a query that looks like this:

SESSION.get sends the query immediately. If it succeeds, we get back the data we wanted, inside RESPONSE. If it fails, we usually get an error message within the response, though in some cases, it may fail silently -- for example, if we try to do something without having the proper permissions.

Functions making GET requests to the Action API have a fairly regular structure. This code is only missing one thing: the particular API we are trying to hit.

Modules
The Action API is made up of numerous smaller APIs, or modules; we will use the two terms interchangeably for the remainder of this tutorial.

For GET requests within the Action API, these smaller APIs generally fall into two broad categories: props and lists.

Props & lists
To make a GET request hit the particular API you want to hit, add  or   to PARAMETERS.

In many cases, prop modules are used for accessing data within a single page, while list modules are used for accessing data across an entire wiki; however, there is overlap. The main difference is how they structure the data they return, within the response.

Prop modules return data nested within an object in the the  element of the response, while list modules return data directly inside the   element.

For developers, all this means is that you need to access a different place within the JSON to get at the data: either  for properties, or   for lists.

Limits
Both props and lists have limits on the amount of data they can return in one go. The maximum number of items allowed in the response at one time depends on your account. Bots and sysops have a maximum limit of 5000, while ordinary users have a maximum limit of 500.

The default limit for most modules is 10 items. You can see more data by making the same query, just with the keyword  appended.

Example: Random, a list module
One of the simplest modules to work with is API:Random. Its response is fairly straightforward. Unlike other modules, it returns a single item by default: a randomly selected wiki page.

To hit API:Random, go back inside the function, action_api, and add  to PARAMETERS:

Now run app.py.

You should see something like this on http://localhost:5000:

Understanding the response
The data we requested is within a key which has the same name as the module -- in this case,. The data is packaged as an array of objects.

Modules that return pages don't directly return an address to the page; by default, they generally just return an id and title. The title is the current name for the page, while the id is a unique identifier which stays the same across moves, edits, and re-names. We can use either one to build a URI to access the page.

The structure for this URI is the base address for the wiki, then, then the title of the page. So, for our response, the final address to the page would look like this: https://en.wikipedia.org/wiki/Mallabhum_Institute_of_Technology

Modules that return pages also include one additional bit of information -- an ns value. Ns stands for namespace. Namespaces are how MediaWiki broadly classifies pages. Articles, discussion pages, and help pages all belong to different namespaces.

Each namespaces has a numeric code to identify it. The main namespace, where wiki articles live, is namespace 0. Discussion pages are namespace 1, and help pages are namespace 12.

Example: Images, a prop module
Since we're trying to build an app to view the Wikipedia Picture of the Day, we need to use API:Images.

API:Images is a little more complicated than API:Random. It requires passing some more parameters in the query, and it returns multiple items at once.

In addition, unlike API:Random, which is a list, API:Images is a prop, so the response from it looks a little different.

Finally, like most property modules, API:Images relates to the properties of a page: namely, the images embedded in it. We could specify multiple pages, but for this example, we'll only query one.

To hit API:Images, go back inside the function, action_api, and remove  from PARAMETERS.

Add  where   once was. Then, underneath this line, add, like so:

Now run app.py again. You should see something like this:

Understanding the response
Props return data nested within an element corresponding to the page id. Thankfully, we do not actually need to know this id ahead of time to access the data; we can just index into the first object contained within the  element, if we have a single page.

Within this element, we have individual items listed off inside an array, with their namespace and title. Each item represents one image. The title, in this case, is the file name of the image.

Again, as with API:Random, we only get the title of the image, not the image's address. However, unlike API:Random, it is not simply a matter of appending the file name to a base address to get the right URI for the image. If we want the address to display an image, we need to use an additional API -- API:Imageinfo.

Example: Image info, with module parameters
Because image addresses are more complicated than page addresses, we need to hit API:Imageinfo to gain access to where the image is hosted.

All modules have special parameters associated with them, which alter what kind of data is returned. For example, API:Imageinfo has. By default, API:Imageinfo just returns the timestamp and username associated with the last modification of the image. Adding  to our query will also retrieve the image's url.

Let's go back to to PARAMETERS and update it so we're now hitting API:Imageinfo.

Run app.py, and you should see this:

Understanding the response
Here, within the  element, we see that the id key is , because this image lacks a page id. Some images, such as, do have unique page id's, but this one does not, perhaps because it is in a shared repository on Wikimedia Commons, instead of directly uploaded to Wikipedia.

The  returned contains three URI's: ,  , and. The first is the one that we want. It's a direct link to the image itself. The second and third ones link to pages containing some meta-data about the image, such as a brief description, the user who originally uploaded it, and so on.

We now have the information we need to build a Flask app displaying the Wikipedia Picture of the Day.

Displaying the page
We should now have a file containing code that looks like this:

We have our code for hitting an API, but it's not quite a full app yet. When we run it, all it does is display some raw JSON. Our code also isn't flexible enough yet to correctly query the Wikipedia Picture of the Day. We'll have to combine several API calls to display the picture. Finally, we can only make a request when we open or refresh the page.

We're going to have to add a few more files, and import some things from Flask to get it all working together.

Adding the HTML file
Let's create a new file. We'll name it index.html. Add some basic HTML 5 boilerplate to it, a few elements, and save it in the same directory as app.py.

Rendering the page
If you were to run app.py at this point, you wouldn't see anything different. The app doesn't know about this file yet.

We need to connect the two files. Replace  inside of action_api with

Now the app knows about the web page -- but it returns an error?

Template directory
Flask expects to see a directory named templates in our app. If it's not there, it won't know what to do when we call render_template.

Templates is where we put files which contain some dynamic elements. In our case, we place index.html in there because we'll be dynamically updating it to display the results from our API call.

Although Flask uses the Jinja framework to render templates, plain HTML will also work fine. We leave re-writing the page in Jinja syntax as an exercise for the reader.

After creating a templates directory, putting index.html inside, and getting render_templates to run, we now see a mostly blank page, displaying Picture of the day: in big h1 letters.

Static directory
We have a web page. We have an API call. Both are working...but only one or the other is being displayed. How do we update the web page to display the API call?

The Python in app.py can't do so directly. We need to create a script file, to hold some Javascript that will update the page for us. Flask uses a directory named static to contain any helper files that stay the same throughout the lifecycle of the app. We'll be adding our Javascript file there. We'll also add a CSS file to style the page there, later.

Adding the CSS
We have some elements on the page now, and data to display. It's pretty rough on the eyes, though. Let's style it.

Create a new file in the static directory, and name it what you like. Ours is called style.css.

We'll be adding some colors and features based on the Wikimedia Style Guide.

The app lifecycle
When we run app.py and render the page, the script file starts up. It waits to fire until the page is fully loaded. The first thing it fires off is a POST request, aimed at the page itself.

App.py is waiting for this POST request. It indicates that the page is now loaded and ready to accept data. So, our Python code fires off a GET request to the API. It returns the data in JSON format.

The Javascript is still active and monitoring whenever the request state changes. It sees that the request is done. It also sees that the request succeeded. It parses whatever is inside the response from the request. The response contains the JSON we just packaged in app.py.

The Javascript takes the JSON, finds the data, adds HTML tags around it, and appends it to the potd element on our page.

The HTML file is waiting. Until now, it's just been displaying whatever we wrote into it. Now, with the changes to the potd element, it updates and renders the data with tags and styling.

Sample app
A complete version of this app is available on Github, at ___.