User:LBowmaker (WMF)/Airflow job in 5 mins

Pre-requisite knowledge:
 * Your job will be scheduled using a tool called Airflow (to learn more click here)
 * The process will require you to copy and edit 3 files - a query file which contains your Hive query, a Python file which is used to schedule your query and another Python file that contains tests for your job.
 * Some basic knowledge of Git commands and code repositories.

Step 1: Create your query file
TO DO for DE: We should create a folder called something like 'product_analytics' in the below repo (also we likely need a product analytics instance of Airflow):

https://github.com/wikimedia/analytics-refinery/tree/master/hql

Run the following commands:

Copy the example .hql file here - copy and paste your query in and create parameters as needed. Save your .hql file under the hql/product_analytics folder and run.

To test your .hql file run the following on a stat box:

If everything is as expected run:

Check here to make sure your push request was successful:

https://github.com/wikimedia/analytics-refinery/pulls

Make a note of the link to the PR for use later in the ticket.

Step 2: Create your scheduling file
Run the following commands:

Copy the example .py file here - copy and paste your query in. Save your .hql file under the product_analytics/dag folder.

Then run:

Check to make sure your merge request was pushed successfully:

https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests

The repo will run the tests on your files.

Step 3: Create a ticket for DE review
Once the above steps are complete create a ticket, tagged with 'data-engineering' and including the 2 links to the PR's you submitted.

Someone from the DE team will review and if all looks good it will be deployed.