Requests for comment/URL shortener

This is a request for comment about implementing a URL shortener service for use by Wikimedia projects (42085).

Background
A URL shortener is a service that takes long URLs (such as ) and shortens them in terms of number of characters needed to represent that URL.

There are generally two types of URL shorteners:


 * 1) http://enwp.org/foo and http://youtu.be/foo kind that do direct expansion of the URL; and
 * 2) http://ur1.ca/foo kind that convert a hash or shortened version into a longer form of the URL.

Both of these implementations generally use HTTP 301 server-side redirects.[citation needed?]

Traditionally these types of links have only been needed on external social media services that have arbitrary character limits, such as Twitter. However, the need for their use in other contexts is allegedly expanded (see below).

It's also important to note that many wikis block URL shorteners as they're a spam vector (this very page can't have links to youtu.be, for example).

Use-cases

 * Links in Echo notifications that are e-mailed or broadcast via XMPP
 * Neither e-mail nor XMPP have arbitrary character limitations, do they? I don't see the use-case for a shortened URL. --MZMcBride (talk) 20:59, 17 November 2012 (UTC)
 * Links to fundraiser landing pages that are posted to social media or sent via email
 * So Twitter and identi.ca? --MZMcBride (talk) 20:59, 17 November 2012 (UTC)
 * No, not just Twitter and identi.ca. All social media (and regular media) targeted by fundraising. Kaldari (talk) 19:33, 26 November 2012 (UTC)
 * URL sharing via the Mobile App
 * For use by the Wikimedia Foundation Communications Department in tweets (linking to a third-party site can be problematic... though there's apparently ur1.ca?)
 * Not new
 * File sharing from Commons
 * What does this mean, exactly? --MZMcBride (talk) 19:29, 18 November 2012 (UTC)
 * Via 'Email a link' and 'Use This File'. Right now it gives you URLs like https://commons.wikimedia.org/wiki/File%3ACircle_of_the_Limbourg_Brothers_-_Medallion_with_the_Emperor_Augustus's_Vision_of_the_Virgin_and_Child_-_Walters_44462_-_Back.jpg. It would be nice to have a Short URL option.
 * I'd expect the Commons community not to use shortened URLs (even if available) in such a case, unless 1) the legals confirm that "attribution by URL" is also ok with any variation of the URL, 2) the WMF ensures that the service will be maintained forever (which brings us again to the costs consdierations). --Nemo 10:50, 20 November 2012 (UTC)
 * Long Gerrit and Bugzilla URLs

Considerations
We have Extension:ShortUrl, which is a solid foundation[citation needed] for creating short hashes of titles: it creates a link to that title (which can be obsoleted by a page move, unlike the pageid), which are not like our usual permalinks (they don't use the revision id, they always link the last revision). However without a dedicated domain to configure it with (more on that below) it will still be limited to the length of our primary domains (e.g..

Extension
The extension currently has some bugs. Most importantly, it has not yet found a sensible way to link the short URL: 38863.

The current implementation is not deployable anywhere but on wikis desperate with their long percent-encoded URLs, because it clutters the interface in an extremely annoying way and it doesn't give users any clue on where to find short URLs/what are the interface elements it adds.

Domain
The main thing needed is allegedly a short domain name. This would most likely have to be donated to us since short domain names aren't cheap.
 * The Wikimedia Foundation has a pretty big budget these days. If it really wanted a short domain, it could buy one. --MZMcBride (talk) 21:16, 17 November 2012 (UTC)

It would be best to use some sort of subdomain or path scheme to make it usable for all Wikimedia Foundation projects:

e.g. lang.abbrev-site. .org/ or abbrev-site. .org/lang/ or .org/abbrev-site/lang/


 * w.org (exists, for sale)
 * w.co (available)
 * w.ly (exists, unavailable)
 * wmf.org (exists, unavailable)
 * wmf.co (exists, for sale)
 * wmf.ly (available)
 * wi.ki (exists, for sale?)
 * w.mf (available)

And then of course there's the question of protocol: HTTP v. HTTPS.
 * How is that a question? The protocol and the domain are not related to each other, and our servers support HTTPS. They will both work. Krinkle (talk) 23:12, 17 November 2012 (UTC)
 * Question was poor wording on my part. I guess I was thinking about how a separate domain requires an additional SSL certificate. And I wasn't sure it was always a given that every Wikimedia service will support both protocols.
 * In this scheme, I guess http and https would redirect to their corresponding expanded forms. Perhaps I should have written "HTTPS support" and left it at that. --MZMcBride (talk) 07:00, 18 November 2012 (UTC)

Maintenance
Who's going to maintain this service for the indefinite future? Is the Wikimedia Foundation willing to maintain this service forever? If so, who within the Foundation will be in charge of maintenance?

Note that some of the use cases above would make maintaining the service forever a legal obligation: «[...] an alternative, stable online copy that is freely accessible [...]» (Terms of use).

The Wikimedia Foundation currently has a number of services (such as OTRS) that it has difficulty maintaining. Any additional service has real costs (adding features, fixing bugs, etc.). What are the actual costs here?

Obfuscation and mis-use
URL shorteners have a cost: they introduce a middle-man dependency. By including a shortened (hashed) URL, you obfuscate where the underlying content is. If the service is unreachable (offline, broken, down) and there's no dictionary to resolve the URL, the content can be lost or irretrievable.

URL shorteners can also be mis-used, such as being included in contexts where there is no legitimate reason to use a shortened URL (such as blog posts or in HTML). Nearly all URLs are clicked or copied and pasted.[citation needed]

Analytics and privacy
We'll want to have some analytics capabilities for whatever shortener we use, and we'll need make sure that it adheres to our privacy protection policies and expectations.

Approach
Most URL shorteners look at domain hacks, but domain hacks are arguably just a fad. An alternate approach to domain hacks and hashing would be pushing for the implementation of a new protocol such as wiki://. So you'd have something like:


 * wiki://w/en/Barack_Obama

The part following the protocol could follow our current interwiki syntax.

However, this would be a much longer process (convincing Web browsers and the world to adopt the protocol) and would still run into the issues discussed above with regard to youtu.be and enwp.org-type URL shorteners: namely that page titles can be quite long (up to 255 bytes), so you might not ultimately save many characters.
 * Interesting idea, although it seems like the most work to actually implement. Kaldari (talk) 19:32, 19 November 2012 (UTC)

Plan

 * 42085 (see dependency graph)
 * Simply set up lilurl? Is more needed?
 * Yes, more would be needed at least to replace some of the use cases. When WMF has used shorteners before (for example, in testing Twitter, Facebook and other venues as means of getting donations or getting people to contribute to the projects), the analytics features of the shorteners has been a key feature (which is why WMF hasn't stuck to ur1.ca and similar). So having some of those features in a WMF-run service that also comes with the same privacy protection users expect from us would be much better than a barebones shortener.--Sage Ross (WMF) (talk) 00:30, 20 November 2012 (UTC)
 * Then I guess someone should start a section above about such analytics features. The most sensible/viable option would be to use none, but if they really have to be tied with the shortener then they have to be carefully planned. Otherwise, associating this proposal to privacy drama/bikeshedding seems likely to kill it. --Nemo 10:50, 20 November 2012 (UTC)