User:Jeblad/challenge 2

From mediawiki.org

Challenges[edit]

Recent changes[edit]

How might you retrieve article change data in a systematic way and represent that data in a compelling visual way?

Recent changes after page load can be collected from the ordinary rc-feed, and a short summary given in small window like the notification system. The summary should be limited to the pages of interest to the user, normally the pages on the watchlist extended to whats visited in this session. It should also show changes on the discussion pages.

Everything necessary are more or less done and the data is available, but the code should be rewritten to use session storage and not cookies. The additional display of the watch list feed should be removed.

The watch list block in session storage should be separate from visited pages as this can be very large. By separating them the client load during reconstruction will be limited. It could also be limited by using the keyed storage or sql storage but there are to little support for the moment.

Polling the rc-feed can be more effective by slicing it into blocks on minute boundaries, thereby letting the proxy be more effective in cache the requests. The rc-feed is initially polled each minute for ten minutes, then fall back to two minute interval for twenty minutes, and five minute interval for the remaining of a set time (an hour). After that the page is assumed to be abounded and it stops.

First request after page load would look back to the minute boundary of the last polling for data, but limited to a set time (typically the last hour).

If maximum request time makes it impossible to reach end of last poll the watchlist must be reloaded.

Size of time slices, time to keep the polling alive, and maximum request time, must all be configurable. Large projects like English will need shorter time to keep the load down.

This polling scheme produce a very limited number of possible requests and should cache pretty well.

Article updated ping[edit]

How could we notify a reader when they're looking at a page that new changes have been made while they were looking at it?

This is a simple check as it is only one page. If we're limiting the solution to a single page we may show a short history, infact we can leave the page as it is and just change the contentSub to show that we are on an old version. This would imply that the contentSub must be visible at all times.

This is basically just a special case of an event initiated from the recent changes feed and should share the common infrastructure.

Trending articles[edit]

How might you identify the articles that are currently most interesting to users?

Its a double question, both whats interesting to the user and and whats trending. Probably its only a question of whats trending. Whats interesting can be either the watch list, or visited articles, or both, and it can be a more general Bayesian inference engine or a tf-idf -engine, whatever is possible given the available data. Whats trending is a lot more troublesome as any interesting pages has to be compared to page view statistics. Such data isn't easily available, but one solution is hashing article ids into buckets each hour-ish and comparing it to something daily-ish.

More buckets gives better caching server side, but at client side none would see a cache hit anyhow. It is only necessary to expose the processed list with identified trending articles and this is pretty small.

Trending articles can be driven by ordinary editing and patrolling and the statistics should be adjusted for this, although stats for this isn't available.

This is more or less the trend-bot used a long time ago at no.wp, rewamped as a special page, probably also filtered on a category. Which means lots of troubles with modeling and detecting limits for the different categories.

Top editors today[edit]

How might you determine which editors are having the most impact on Wikipedia today?

Whats today, and impact in which fields? Seems like a contribution-measure at the editor level, but that leaves the patrolling out and this would trigger the old discussion at no.wp about whats important contributions.

Probably best done as a special page, but that doesn't make a very fancy user interface. The report could although be dynamic with some trend indicators.

This has serious load issues without database changes, aka Bug 21860 - Add checksum field to database table; expose it in API.

Post to social media[edit]

How might you enable Wikipedians to share their favorite articles through their social networks, in a way that's consistent with our privacy policy?

As the user posts to his own social network it will be visible to the participants in his group. It should although be possible to hide the users social network from other users, even from the server. That is, sharing should be done client side without any round trip to the server. Any leaking of information about the social network will then come from the social network service itself and can't be avoided.

Basically this is solved already as a client side function. Due to license problems only very few of the social networks have logos, and the number of networks are so high that the page is somewhat unreadable. If the buttons load as part of the skin it is perhaps possible to avoid the licensing issue.

Note that the social network group can be detected by correlating activity on links to a specific article from users at a specific social network site. It is possible to hide some of this by removing the referer during requests, but how this should be done is highly service dependent.

At some services it would probably be possible to post references to articles through a secondary interface, it would probably be an app, to only expose a generic user but it seems unlikely that this will be a very popular solution.

Page views since last edit[edit]

How might an editor be able to see how many people have seen an article since their last changes to it?

This isn't possible with todays setup. Could use the bucket trick.

The page[edit]