Topic on Project:Support desk

Jump to navigation Jump to search

How do I cache a short element of code?

6
87.123.197.137 (talkcontribs)

I have a custom tag in my wiki page. The tag processes a GitHub URL, retrieves information from the GitHub API and then gives it back for display in the wiki page. The information I get is in JSON format.

When the page cache is purged, the GitHub API is queried again and again, which basically is useless, if the information on GitHub.com have not changed. Basically, I want to be able to reuse the information I got from GitHub.

What is the best way to store the information I received from GitHub?

I read Manual:Caching, but it does not help me.

MarkAHershberger (talkcontribs)
2001:16B8:1057:5500:201A:8CB3:E5B9:3EBB (talkcontribs)

Thanks for the hint!

I am now able to get information from the API and to store them in cache. :-) What I am still missing is a way to determine, if there actually was a change before querying the API... I have written this piece of code:

	<?php
		$apiUrl = 'https://api.github.com/repos/wikimedia/mediawiki/issues/94';
		$cacheType = CACHE_DB;
		$cache = ObjectCache::getInstance( $cacheType );
		/* 1 day */
		$cacheTime = 60*60*24;

		$res = $cache->getWithSetCallback(
			$cache->makeKey(
				"bugtracker", "github", $apiUrl
			),
			$cacheTime,
			function () use ( $apiUrl ) {
				$ch = curl_init();

				curl_setopt($ch, CURLOPT_URL, $apiUrl);
				curl_setopt($ch, CURLOPT_HEADER, false);
				curl_setopt($ch, CURLOPT_ENCODING, 'identify');
				curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0');
				curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
				curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
				$res = curl_exec($ch);
				curl_close($ch);
				return $res;
			}
		);
		return $res;
MarkAHershberger (talkcontribs)

> What I am still missing is a way to determine, if there actually was a change before querying the API...

How would you determine if it were changed without querying? If you could do that, you wouldn't need to cache this at all.

Right now, though, you have the timeout set to one day. I would say change it to one hour so you don't have to wait so long.

2001:16B8:1078:7C00:A86F:7E18:23A9:C131 (talkcontribs)

> How would you determine if it were changed without querying? If you could do that, you wouldn't need to cache this at all.

I wanted to only request the header first, check the value of Last-Modified or of the ETag and then, only if the content has really changed, request the content again. The thought behind that also was to not hit the GitHub rate limit so quickly. However, it seems like requesting only the headers is counting in the rate limit as well. So I end up doing two requests for one thing: Always request the header (which counts into the rate limit) and then doing a second request, if the ETag has changed. Not how I thought this would work... That way it would be cheaper to just always get the content again.

Or am I missing something?

MarkAHershberger (talkcontribs)

If the etag has changed you should be getting the updated response in the first request. That is, if you get 200 Ok, you're also getting the updated page. If you're getting 304 Not Modified, then just use what you have.

Reply to "How do I cache a short element of code?"