User:Martyav/Apps/Tutorial

This is a tutorial for building a simple web app that uses the MediaWiki Action API to access data from Wikipedia. Specifically, the app displays the current Wikipedia Picture of the day.

Programming languages & concepts
This app uses a web framework called Flask. It allows us to write our Action API requests in Python.

Flask comes packaged with Jinja. Jinja uses a template to generate a web page based on the results from our request. For the most part, the template file looks exactly like standard HTML, with some special markup for dynamic content.

The API results we will be working with are all JSON.

Finally, the app has an accompanying CSS file for styling the web page.

Setting up Python
This tutorial uses Python 3.

You can download the latest Python version from here:


 * Python for Windows 7, 8, and 10
 * Python for Mac OS X

If your operating system is Windows XP, a Linux distro, or something else, see the Python beginner's guide for further instructions on installation.

Setting up Flask
In your command line interface of choice, run

Note: Pip is a package manager that should have come with your Python installation.

If you don't have it already, see Pip installation on the official website.

Hello world in Flask
Create a new directory for your app, and name it whatever you like. If you visit the Github repo, the final version of this app is named Picture of the day viewer.

Open your new directory and create a new file to contain the main code for the app. We'll name ours app.py.

To this new file, add the following:

Save the file, and run. This will start a local server, running the code you've just written. You can view the web page in your browser by visiting http://localhost:5000

If Flask is successfully installed, you should see the words "Hello world!" displayed in the default font, on a mostly blank web page.

Making the API request
Now that we have everything set up and know Flask is working, we can start writing our code for hitting the Action API.

Import Requests
The Action API works by sending back data in response to a HTTP request. Since making HTTP requests is so important to the key functionality of our app, we should import the Python Requests library, as it will make our lives easier.

Add the following line to the top of your file:

If you don't already have this library installed, make sure to run the following command:

The basic GET request structure
Now, remove the old function,, and add this in its place:

adds  and   from parameters to the end of the endpoint string. This creates a query that looks like this:

sends the query immediately. If it succeeds, we get back the data we wanted, inside. If it fails, we usually get an error message within the response, though in some cases, it may fail silently -- for example, if we try to do something without having the proper permissions.

Functions making GET requests to the Action API have a fairly regular structure. This code is only missing one thing: the particular API we are trying to hit.

Modules
The Action API is made up of numerous smaller APIs, or modules; we will use the two terms interchangeably for the remainder of this tutorial.

For GET requests within the Action API, these smaller APIs generally fall into two broad categories: props and lists.

Props & lists
In many cases, prop modules are used for accessing data within a single page, while list modules are used for accessing data across an entire wiki; however, there is overlap. The main difference is how they structure the data they return, within the response.

Prop modules return data nested within an object in the the  element of the response, while list modules return data directly inside the   element.

For developers, all this means is that you need to access a different place within the JSON to get at the data:


 * for properties
 * for lists.

To make a GET request to a particular API, add  or   to PARAMETERS.

Example: Random, a list module
One of the simplest modules to work with is API:Random. Its response is fairly straightforward. Unlike other modules, it returns a single item by default: a randomly selected wiki page.

To hit API:Random, go back inside the function, action_api, and add  to parameters:

To display the results on the web page, import the json library alongside requests:

Also, inside, change the last line:

Now run app.py.

You should see something like this on http://localhost:5000:

The data we requested is within a key which has the same name as the module -- in this case,. The data is packaged as an array of objects.

Modules that return pages don't directly return an address to the page; by default, they generally just return an  and.

The  is the current name for the page, while the   is a unique identifier which stays the same across moves, edits, and re-names.

We can use either one to build an address to access the page. The structure for this address is the base address for the wiki, then, then the title of the page. So, for the response in our example, the final address to the page would look like this: https://en.wikipedia.org/wiki/Mallabhum_Institute_of_Technology

Modules that return pages also include an additional bit of information -- an  value. stands for namespace.

Namespaces are how MediaWiki broadly classifies pages. Articles, discussion pages, and help pages all belong to different namespaces. Each namespaces has a numeric code to identify it. The main namespace, where wiki articles are hosted, is namespace 0. Discussion pages are namespace 1, and help pages are namespace 12.

Example: Images, a prop module
Since we're trying to build an app to view the Wikipedia Picture of the Day, we need to use API:Images.

API:Images requires passing some more parameters in the query, and it returns multiple items at once, instead of just one.

In addition, unlike API:Random, which is a list, API:Images is a prop, so the response looks a little different.

Finally, like most property modules, API:Images relates to the properties of a page: specifically, the images embedded in it.

To hit API:Images, go back inside, and remove   from PARAMETERS.

Add  where   once was. Then, underneath this line, add, like so:

Now run app.py again. You should see something like this:

Props return data nested within an element corresponding to the page id.

Getting at the data inside  without knowing the   ahead of time can be tricky. One way around it is to specify the version format of our JSON:

JSON version 2 returns  as an array, which is easily indexed into.

Another way to work with  without explicitly knowing page ids ahead of time, is to iterate into it, using Python's built-in next function.

Either way, inside this object, we have individual items listed off with their namespace and title. Each item represents one image. The title, in this case, is the file name of the image.

Again, as with API:Random, we only get the title of the image, not the image's address. However, unlike API:Random, it is not simply a matter of appending the file name to a base address to get the right url for the image. If we want the address to display an image, we need to use an additional API -- API:Imageinfo.

Example: Image info, a prop module with module parameters
Because image addresses are more complicated than page addresses, we need to hit API:Imageinfo to get the correct image url.

All modules have special parameters associated with them, which alter what kind of data is returned. For example, API:Imageinfo has. By default, API:Imageinfo just returns the timestamp and username associated with the last modification of the image. Adding  to our query will also retrieve the image's url.

Let's go back to to  and update it so we're now hitting API:Imageinfo.

Run app.py, and you should see this:

Here, within the  element, we see that the id key is , because this image lacks a page id. Some images, such as, do have unique page id's, but this one does not, perhaps because it is in a shared repository on Wikimedia Commons, instead of directly uploaded to Wikipedia.

The  returned contains three distinct urls: ,  , and. The first is the one that we want. It's a direct link to the image itself. The second and third ones link to pages containing some meta-data about the image, such as a brief description, the user who originally uploaded it, and so on.

Displaying the page
We have our code for hitting the Action API, but it's not quite a full app yet. When we run it, all it does is display some raw JSON.

Template directory
Create a new directory, named templates.

Flask expects to see a directory named templates in our app. Templates is where we put files which contain some dynamic elements. In our case, we'll be placing the index page of our app here, because we'll be dynamically updating it to display the results from our API calls.

Flask templates use Jinja to render dynamic elements. Jinja markup looks like this --  -- and is used to inject Python variables or expressions into our page.

Static directory
Create a new directory, named static.

Flask uses static to contain any helper files that stay the same throughout the lifecycle of the app. We'll add a CSS file to style the page here.

Adding the template
Let's create a new file inside the templates directory. We'll name it index.html. Add some basic HTML 5 boilerplate to it, a few elements, and save it in the same directory as app.py.

With that scaffolding established, now add a placeholder for our dynamic content -- the data from our response:

Adding the CSS
Create a new file in the static directory, and name it style.css. We'll be adding some colors and visual motifs based on the Wikimedia Style Guide.

First, the basic page elements:

Now, some styling for a new CSS class wrapping around our picture. This will make the app look more polished:

Connecting index.html with style.css
Finally, we will need to reference the stylesheet and script on our web page before any of the magic happens. Update the head element with references to our files, and add a new div referencing the new CSS class:

Rendering the page
If you were to run app.py at this point, you wouldn't see anything different. The app doesn't know about this file yet. We need to connect the two files.

At the top of app.py, import render_template alongside Flask:

Create a new function alongside. Since it's rendering the index page, we're naming it.

Now the app knows about the web page, and will pass the response from  as.

Understanding the app lifecycle
We now have four files:

app.py

/static/script.js

/templates/index.html

/static/style.css

When we run app.py and render the page, the Javascript file starts up. It waits to fire until the page is fully loaded. The first thing it fires off is a POST request, aimed at the page itself.

App.py is waiting for this POST request. It indicates that the page is now loaded and ready to accept data. App.py fires off a GET request to the Action API. It returns the data in JSON format.

The Javascript is still active and monitoring the request. It sees that the request is done. It also sees that the request succeeded. It parses whatever is inside the response from the request. This currently contains the JSON we just packaged in app.py.

The Javascript now takes the JSON, retrieves the data, adds HTML tags around it, and appends it to the potd div on our page.

Until now the HTML file has just been displaying whatever we initially wrote into it. Now, with the changes to the potd div, it updates and re-renders the page with our new tags and styling.

Picture of the day viewer
The final sections of this tutorial relate how to make our app work with Wikipedia:Picture of the day.

Picture of the day, or POTD, is a featured image displayed on the home page of Wikipedia.

We'll be hitting an endpoint containing a wiki template that changes every day, and using the data we find there to get at the image and some descriptive text about it.

Getting today's date
The first order of business is simply knowing what day it is.

Because POTD updates daily, we need today's date to access the archives and get at a stable version of the correct picture.

We're going to import another native Python library.

Go into app.py, and add this line near the top of the file:

Then, in index, add a variable containing the current date:

We generate a string from date.today, because it actually returns a datetime object.

Todays_date is in the format YYYY-mm-dd, which thankfully is the same exact format that dates are listed within the Wikipedia POTD archives.

Refactoring app.py
We're now going to re-arrange a few things inside app.py.

To get all the data we need, we'll be making several Action API calls, so having a single function named action_api doesn't make sense anymore.

First, pull out the variables session and endpoint from action_api. Add them underneath. They're constants in the module scope, so change their variable names to be in all-caps, as described in the Python style guide, Pep8:

Next, define a new function, fetch_potd(date):

We access the protected POTD page to get at the most stable version of the image in the archives. This gives us the information we need to reach the image. However, we don't have the address to the image quite yet.

Define another function, named fetch_image_url:

Update fetch_potd to include a call to this function:

Finally, alter index to call fetch_potd:

Displaying title and date
We're trying to display some slightly different information now, so we need to change our HTML. We want to show the image, its file name with a link back to where it is hosted, as well as the current date.

Most of the information relating to the image should stay on the card div, and the relevant elements can be added dynamically, via the script. However, we also have a date to display. That should go in its own section, in case we want to include controls for changing the date.

Underneath our card div, add a new div to contain our date:

Now, inside our CSS file, add some styling for the new things we plan to have on the card...

...and for the date divs:

Updating our Javascript to handle the new data
Since we're now sending different JSON out from app.py, we're also going to have to alter our Javascript.

Near the top of our file, add a new variable to hold on to a reference to our new date div:

We'll be adding data to multiple page elements this time, so define a helper function for constructing the HTML:

Now, go inside the call to xmlHttp.onreadystatechange. Add these variables to contain the data from our JSON:

We have to add tags to the card data before adding it to the page:

Lastly, append the data to the page:

Final notes
We now have a complete web app for displaying Picture of the Day!

Postscript 1: Adding a description
We have the filename and image for the daily POTD, but how do we get at the nice descriptive text accompanying it on the Wikipedia page?

This is actually trickier than it seems at first glance.

The core modules of the Action API don't provide a plaintext version of article or template content. The response you get always needs to be parsed, and may contain wiki markup or HTML tags.

If you're not up to parsing text from the latest edit, via API:Revisions, there are two ways of getting around this.

Snippets
One is to use API:Search to grab a snippet off the page. This snippet is always parsed already. Unfortunately, when it comes to POTD, the text in the snippet isn't always high quality -- if it's even there at all. However, since snippets are part of the core functionality of the Action API, you don't have to worry about incompatibilities, and the majority of snippets are adequate as preview text.

The code for accessing the snippet would look like this:

CirrusSearch
Another way to get at the description is dependent on the particular wiki you're trying to hit.

Wikis that have the CirrusSearch extension to API:Search allow you to access the text of an article via. A query to do so would look like this:.

Note that, while most of the markup is stripped out of the text in the API response, some escape characters will be included.

The results are of better and more consistent quality than from API:Search snippets, but not all wikis have CirrusSearch extension installed. Using this extension to access the description for POTD is left as an exercise for the reader.

Postscript 2: User interaction
We can see the picture for the current date, but what if we want to see the picture for yesterday, or even tomorrow?

This information is actually pretty easy to access. Wikipedia keeps all the POTD templates in an archive, dating back to May 4, 2004. In addition, the picture for the day is always added one day ahead of time, so we can access tomorrow, too.

If we were simply curious about other dates, all we would need to do to our code is change it to point to any valid POTD date. In order to access different dates via the web app, though, we need to change a few more things:


 * 1) Add controls to our app, so users can change the date for themselves
 * 2) Style the new controls
 * 3) Alter app.py, to respond to what the user is doing on the page
 * 4) Handle dates that are out of range

Adding controls
Go back to index.html. Inside the date container, add another div, for our new control scheme. We're giving ours the class, "date-picker," so we can style it later:

Recall that app.py works by responding to GET and POST requests. While we could set up buttons with event handlers to send off requests, it's simpler to just use an HTML form, and treat our controls as form inputs:

Here, our inputs are submit buttons, with two different values, "← Back" or "Next →". We could have also used a slider over a range of dates, a text box, or even an HTML5 date input to bring up a calendar widget. Each kind of control presents with different challenges, in terms of input validation, accessibility, and browser support.

Implementing different controls is left as an exercise for the reader; see the MDN web docs for more information on HTML input types if you wish to explore this more.

Styling the controls
Since we've added new HTML elements, we need to style them.

Go to your css file, and add the following:

Updating app.py
Now, let's prepare app.py to view pictures for other dates.

Altering the date
The first thing we need to do is pull out the date variable from index, so that it is accessible to the rest of the file, and can be updated by other functions. Create a new variable in the module scope, and put it right under the import statements:

We want to go backward or forward in time, so we also need to accurately add or subtract from current_date. Python's timedelta allows us to do just that. Add it to our datetime imports:

Now, define functions for incrementing and decrementing current_time with timedelta:

Adding a new route for our form
We now need to add the form action, "/change_date", as a new route to app.py. Routing is generally how Flask knows which methods to call in response to events.

Until now, we've had a single route: "/". This route was associated with the function, index, which renders the index page. Our new route will be associated with its own function, named change_date.

We want to use the new route to update the index page. However, the default behavior would be to send the user to a page with the address of the route. To have change_date behave as we wish, we need it to change the date, then redirect back to "/" and let index handle the actual page rendering.

Flask's redirect method allows us to point to another url, so let's import it and make change_date call it at the very end:

Updating current_date based on user input
When a user selects an input, our form will pass app.py a dictionary, where the key is the name of the button selected, and the value is the button's value. If we try to open up a key and the key is not there, the page will throw an error. To handle any possible error gracefully, we place our code for the form data inside a try/except block:

If the key, "change_date", is there, check the value and respond accordingly:

If it's not there, just default to showing today's picture:

Outside of the try/except block, update current_date to be our new_date:

Because of possible confusion between globally and locally scoped variables, we have to specify that we want the current_date variable outside of our function:

Putting it all together, we have a new function that looks like this:

Handling dates that are out of range
Now that we can view pictures for arbitrary dates, we need to take into account the fact that not all dates have pictures. Anything before May 14, 2004, or after tomorrow, will fail.

Inside fetch_potd, put our initial attempt to access the response from the API into a try/except block:

If our initial attempt fails due to a lack of a picture for that date, pass on some JSON that indicates that the current date is out of range:

Advanced Flask
Flask also offers other ways to do what we're trying to do. Instead of having divs which we update via Javascript, you could rewrite index.html to use Jinja syntax, with template regions for any dynamic content. You could also import WTForms and use these form components to handle user interactions. Taking this more idiomatic approach is left as an exercise for the reader.

Sample app
A complete version of this app is available on Github, at MediaWiki Action API Code Samples.