Content translation/Caching

Redis is primarily used for caching.

Why are we caching?

We allow editing the article in translation multiple times, It can be a case of user stop at x percentage and coming back later. Or it can be case of user faced network issues and reloaded the page.

Preparing translation support data is expensive. Some of them might be paid service.

What are we caching?

1. Segemented content - segments

2. Link translation per language pair - caching wikidata results

3. Dictionary information per language pairs

4. Machine translation for the requested page, per language pair so that we can give faster user experience when an article loaded second time

5. Any other translation tools data to serve them faster when accessed second time.

Cache expiry

To be decided - may be after doing some tests with real data

Or LRU algorithm http://redis.io/topics/lru-cache with fixed memory size

Cache invalidation/purging

the cache hit happens when the revision id, artile, language pairs match. If any of the above variable change, we need a selective cache refresh

The following can be the cache invalidation strategy

If the revision changes - ie - If the article was edited and new version is available, rerun the whole translation support data calculation.BUT while doing that per segments, check the SHA of the segment and check if the cache has a matching SHA - If true, it means, that particular section is same in new version.

This means, in redis, the keys for the segments and other cached items should be SHA, so that checking if a content changed or not is easy

title change ---> all new.

source languge change need to build everything new

target language change > segemented content remain same, link sources remain same.

revision change ---> some part of segments change by comparing SHA

Redis data structure: Hash seems the appropriate data structure.

SHA(title+sourceLang+targetang) seems to be appropriate key