Talk:Requests for comment/URL shortener

lilurl
Stand on the shoulders of giants (or maybe in this case, not giants). Use lilurl, which currently powers ur1.ca and others. --MarkTraceur (talk) 02:11, 14 November 2012 (UTC)


 * You don't need to be a giant to write 400 lines of code. The coding style is not at a standard we would accept for MediaWiki. Concurrent inserts are not supported and will result in key conflict errors. The symbol alphabet is short compared to most URL shorteners (36), I'm not sure if that's deliberate. The input URL is not validated, so it's possible to have it output a Location header with spaces, nulls, etc. There's no localisation.


 * I think the best way for us to implement this would be as a MediaWiki extension. The UI could take advantage of the usual MW facilities, and the redirect could be done with a rewrite rule to a special page, like how we do Wikidata redirects. If it's part of MediaWiki, then it will need very little maintenance. -- Tim Starling (talk) 00:58, 17 July 2013 (UTC)
 * was designed to provide an extensible base for this sort of thing. You'd just add a 'by hash' method or some such. cscott (talk) 22:32, 24 September 2013 (UTC)

Extension:ShortURL
I think it's better than lilurl, but it is limited to redirecting to canonical URLs of articles, it can't redirect to an arbitrary URL, and it can't redirect to a special page. Like lilurl, it is limited to base 36. -- Tim Starling (talk) 01:27, 17 July 2013 (UTC)
 * base36 was an explicit 'design' decision at the time I wrote it - to avoid issues with 'is that upper case or lower case?'. Yuvipanda (talk) 10:01, 25 September 2013 (UTC)

Tim's implementation suggestion

 * A MediaWiki extension.
 * Have a special page UI similar to lilurl etc.: ask the user to submit a long URL, get a small URL back
 * Accept only valid input URLs under WMF-controlled domains, to avoid the maintenance overhead which would come from widespread non-WMF use.
 * Also provide an API module, so that JS can fetch and display a small URL for the current page.
 * Host the redirects at a short domain name, to be purchased.
 * Use a rewrite rule to map short URLs to special page requests, for redirection.
 * Implement using a MySQL table with an autoincrement ID. The ID is converted to a larger base for use in the short URL, similar to Extension:ShortURL.
 * Use base 62 (uppercase, lowercase, digits) or higher.

The idea is to avoid any conceivable use case for external URL shorteners. External URL shorteners are a privacy and reliability concern, and so we should replace them with something in-house. It may be true that many uses of URL shorteners are inappropriate; it may even be that they are entirely redundant and should be discouraged in the strongest terms. However, discouraging them on this RFC is not going to stop them from being used. It's a small project, there are clear benefits, so we should just do it.

RFC authors, please integrate this implementation suggestion with the RFC page if you agree with it. -- Tim Starling (talk) 01:27, 17 July 2013 (UTC)
 * Would we provide wikidumps of the MySQL table? And de-dup entries?  Are there privacy implications (you can find out if someone has already shortened a particular URL, and roughly when)?  And please consider reducing the set of characters to remove easy-to-confuse characters, since one of the points of a shortener is to work in environments where the user has to manually type in the URL. I recommend extending Special:Redirect, but cross-wiki redirects would be a New Thing. cscott (talk) 22:32, 24 September 2013 (UTC)
 * Also, I'm a little concerned with non-en wikis. For example, zhwiki uses the first component of the path to give language variant.  We should make sure that the short link for https://zh.wikipedia.org/zh-hant/User:Cscott (for example) doesn't lose any of its components. cscott (talk) 22:33, 24 September 2013 (UTC)
 * If we just redirect based on the URL itself (rather than a namespace:title combo), I don't think language variants would be a problem. We really should be just using the URL, to support special pages with query strings, fragments, etc. Yuvipanda (talk) 10:05, 25 September 2013 (UTC)
 * Or gerrit changes, or a dozen other non-wiki things (as was said yesterday). Yeah doing it by URL rather than anything relating to titles is way saner. ^demon[omg plz] 22:21, 25 September 2013 (UTC)
 * Where exactly would such an extension live? I'm going to guess it should be deployed just once, rather than per wiki - assuming that it is mapping autoincrement ids to URLs. Meta? This would also have complications for the API. Overall this is a proposal I quite like, although I'm not sure if it has to be a Mediawiki extension Yuvipanda (talk) 10:05, 25 September 2013 (UTC)
 * I'm not entirely convinced it has to be MediaWiki either, but I'm not opposed to it. ^demon[omg plz] 22:21, 25 September 2013 (UTC)
 * The compelling case for doing this on Mediawiki seems to be the API call and the fact you can get that easily serverside, but I'm not sure we can do that while also making this cross-wiki easily. How about a small daemon doing the generation + api endpoint, with varnish doing most of the actual redirecting? We could have a heavily cached API module available too! Yuvipanda (talk) 08:44, 26 September 2013 (UTC)

IRC meeting 2013-09-24
&lt;MaxSem>	I'm in general in favor of https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener#Full_url_mapping &lt;TimStarling>	I have some strong opinions on URL shortening which I've expressed on that RFC talk page * 	brion reads the url shortener... &lt;Elsie>	Can we merge the two RFCs? &lt;Elsie>	I think there are two. &lt;TimStarling>	see section "Tim's implementation suggestion" &lt;Elsie>	https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener_service_for_Wikimedia &lt;Elsie>	https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener &lt;^d>	I think the two should be merged, and I like Tim's idea. &lt;TimStarling>	the second was created 2 days ago? &lt;Elsie>	TimStarling: When you say a MediaWiki extension, you mean in addition to ShortUrl? &lt;Elsie>	TimStarling: It seems so, yes. &lt;ori-l>	could https://www.mediawiki.org/wiki/Extension:ShortUrl be adapted to Tim's specifications? &lt;Nemo_bis>	Elsie: they only share 100 &amp; of the first 34 chars in the title &lt;RoanKattouw>	ori-l: Probably yes &lt;TimStarling>	ShortURL was doing something different to what I suggested &lt;^d>	Just rewrite it then. &lt;RoanKattouw>	It already does some of those things, this RFC has a slightly wider scope &lt;parent5446>	sumanah: thanks &lt;Nemo_bis>	Just ensure to kill the link below the page title &lt;brion>	ok i'm just a little lost on what exactly's being proposed by the URL shortener rfc &lt;Elsie>	^d: s/rewrite/improve/ &lt;Elsie>	brion: Generally? People want stable shorter URLs. &lt;Elsie>	For Wikimedia resources. &lt;sumanah>	(especially for non-Latin charsets) &lt;sumanah>	(iiuc) &lt;^d>	Finding a suitable short domain that would work for all projects is hard. &lt;^d>	I've long suggested wi.ki, but Kiribati domains are horribly expensive last I saw. &lt;brion>	specifically, are we talking "short URLs for MediaWiki pages" or "a general shortener that accepts any URL and redirects to it"? &lt;Elsie>	brion: See Tim's third point. &lt;TimStarling>	well, ShortUrl is the former &lt;bd808>	wmf.co is up for auction at godaddy &lt;sumanah>	page 18 of https://commons.wikimedia.org/wiki/File:WMF%27s_New_Global_South_Strategy.pdf mentions various Indic languages, Vietnamese, Tagalog, Bahasa... it's really helpful to some of these folks to have shortened URLs to pass around &lt;TimStarling>	and I am suggesting the latter, except for WMF domains, not any domain &lt;^d>	brion: A general shortener but for all things wmf. &lt;^d>	Is how I understand it. &lt;Elsie>	https://www.mediawiki.org/wiki/Talk:Requests_for_comment/URL_shortener#Tim.27s_implementation_suggestion &lt;RoanKattouw>	Extension:ShortURL is page ID-based, IIRC, so it's per-wiki and only does links to full pages &lt;^d>	Yes, Tim's idea. &lt;RoanKattouw>	Tim's idea is a ur1.ca-like thing except that it limits the domains you can link to &lt;^d>	Right, I like that. &lt;RoanKattouw>	(but allows arbitrary URLs within those domains, not just wiki pageS) &lt;Elsie>	I may re-arrange the Etherpad to put the resolved RFCs on the bottom. &lt;brion>	*nod* ok makes sense &lt;TimStarling>	ShortURL is not page_id based, it is namespace/dbkey &lt;RoanKattouw>	Oh, my apologies &lt;AaronSchulz>	I guess Tim's idea is fine though don't like shorteners in general &lt;TimStarling>	at least in the version I have in front of me &lt;tyteen4a03>	Krenair, oh sorry, nope &lt;RoanKattouw>	I think there was a point in time that it was page IDs, or where I thought it should be page IDs for some reason &lt;tyteen4a03>	Krenair, just a passerby - ignore me :) &lt;RoanKattouw>	I reviewed that extension but it was a long time ago &lt;sumanah>	Krenair: parent5446 is Tyler &lt;Krenair>	okay :) &lt;RoanKattouw>	Re https://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener_service_for_Wikimedia, that shoulld probably be folded into the larger RFC &lt;^d>	"The fee for a second-level domain is A$1,000" &lt;TimStarling>	ok, so you think I should rewrite this RFC into something reflecting my proposal? &lt;^d>	That's not bad :) &lt;RoanKattouw>	which is more complete &lt;sumanah>	hey subbu &amp; gwicke - want the logs of the chat till now? :) &lt;TimStarling>	and then have someone accept it? &lt;brion>	ok… so it feels like we have tim's counter-proposal and …. what tim just said :D &lt;kylu>	out of curiosity, why don't we just have a deal in place with bit.ly like they have for 1.usa.gov ? &lt;gwicke>	sumanah: do you have a link? &lt;sumanah>	gwicke: no, I'd be emailing you a transcript &lt;^d>	TimStarling: Go for it, I'm totally on board with this one. &lt;RoanKattouw>	TimStarling: I think you should start writing it out as a separate section or subpage, probably? &lt;subbu>	sumanah, sure &lt;subbu>	thanks &lt;gwicke>	subbu: that would be handy &lt;brion>	kylu: we have endemic NIH syndrome, based in part on our desire to be self-sufficient and ensure that data remains open :) &lt;TimStarling>	well, an RFC is meant to be a single proposal that is accepted or rejected as a whole &lt;RoanKattouw>	Right, yeah &lt;Elsie>	TimStarling: Does your proposal allow for discerning the target from the short URL? &lt;RoanKattouw>	This is a great process question, BTW &lt;TimStarling>	so it should be modified until it can be accepted &lt;Elsie>	Obfuscation is an issue. &lt;RoanKattouw>	If someone has a substantially different proposal that accomplishes the same goal, should they rewrite the RFC page, or should they write a new one, or what? &lt;TimStarling>	Elsie: you mean an API? &lt;legoktm>	Elsie: Ideally there would be an API that you can pass the shorturl too, and would return the full one &lt;brion>	TimStarling: agreed; my recommendation is we move the rfc into an 'editing' state while you tweak it &lt;Elsie>	TimStarling: I mean /ddsfiodsjf is meaningless. &lt;cscott>	so, would we dump the url-shortener tables from our db? &lt;Elsie>	While a derived key might be more useful /1234. &lt;cscott>	and/or dedup them? &lt;kylu>	brion: dankon. &lt;Elsie>	Where 1234 is a page ID, for example. &lt;^d>	What about linking to things that don't have pageids? &lt;brion>	cscott: something like that yeah (providing data dumps for the redirection) &lt;Elsie>	^d: Better defined use-cases would be nice, yes. &lt;brion>	^d: i think utility is maximized by taking in arbitrary URLs, which may include parameters &lt;Elsie>	There's always, y'know, the regular URL. &lt;brion>	just limiting to certain domains &lt;TimStarling>	cscott: I don't think the extension is currently deployed &lt;^d>	brion: That's my point. &lt;^d>	I agree with Tim and you. &lt;brion>	excellent &lt;sumanah>	it is on tawiki, isn't it? &lt;^d>	But Elsie is saying it'd be nice to have the short url mean something. &lt;manybubbles>	Elsie: I think the de-obfuscation is less important since we're limiting it to wmf urls - but still worth doing. &lt;AaronSchulz>	ShortUrl? &lt;^d>	Which is hard, if we're allowing things that aren't normal pages. &lt;Elsie>	ShortUrl is deployed to a few Wikimedia wikis, yes. &lt;AaronSchulz>	I fixed memcached errors with it, so it must be running ;) &lt;sumanah>	yeah, hiwiki, orwiki, tawiki, some others &lt;Elsie>	^d: Right. enwp.org/foo, even if it stops working, can still be deciphered. &lt;brion>	making the short urls look meaningful is tricky… but one could devise some ideas. it's worth considering as an adjunct &lt;Elsie>	That's a nice protection feature. &lt;brion>	consider also intl issues &lt;Elsie>	We could also make dumps available of the key-values. &lt;Elsie>	If we consider URL titles non-private. &lt;brion>	definitely dumps yes &lt;Elsie>	I'm not sure how private wikis would fit in. &lt;Elsie>	Or links to secret Etherpads. ;-) &lt;^d>	Elsie: But, how do you link to something that's not a wikipage? Like a thumb on commons. Or a page in ganglia. Or something in gerrit? &lt;brion>	don't shorten those ;) &lt;brion>	(secret things) &lt;csteipp>	people will... &lt;brion>	csteipp: not if we block m &lt;Elsie>	^d: I dunno. Perhaps b for Bugzilla, g for Gerrit... or perhaps we give up on this particular goal. Not sure. &lt;gwicke>	how about using something like enwp.org/foo and deduplicate with a hash suffix for longer urls? &lt;cscott>	TimStarling: no, i mean that keeping the service up "indefinitely" is made easier if archive.org etc can archive our complete table of redirects. &lt;csteipp>	brion: Yep, I'm all for that &lt;^d>	Elsie: We're going to run out of letters :p &lt;TimStarling>	if we rewrite ShortUrl, we can just migrate the existing table into one with a URL as a value instead of namespace/title &lt;Elsie>	We're already at the half-hour mark. &lt;Elsie>	We should move on. &lt;gwicke>	that way urls would be both length-limited and somewhat readable &lt;Elsie>	(In my opinion.) &lt;cscott>	i added some comments to the talk page. &lt;brion>	agreed w/ moving on &lt;sumanah>	I propose that Tim or Brion ask the proposers of those RfCs (on URL shorteners) to look at this discussion &amp; combine/respond &lt;brion>	next steps on this: tim to update the rfc, then we ping mailing list for more discussion. yes? &lt;TimStarling>	yes &lt;brion>	*nod* &lt;cscott>	sure. &lt;MaxSem>	+1 &lt;Krenair>	okay

Tracking vs Human readable
things like w.org/sdf92d are great for tracking the effectiveness of a share but still aren't that human readable, would we reserve w.org/bats for wikipedia.org/wiki/Bat‎ ?

How would we handle cases where the page title has been used already and now have a mismatch between wikipedia.org/wiki/XXX and w.org/XXX — Jaredzimmerman (WMF) (talk) 20:09, 30 October 2013 (UTC)

Simple short-url linking script
I've made a short script at wikipedia:User:Joeytje50/shortLink that uses http://enwp.org (and http://frwp.org, the only other wp url shortener I could easily find) to automatically find the shortest url that leads to the page you're viewing, including redirects (there's currently nothing in the script that checks if the redirect is redirecting to the page itself, or a section, so the given urls might lead to a section). The script shows either the shortest url, and all urls that are less than 5 characters longer than it, or if that would result in less than 10 urls, it keeps going until the amount of urls is higher than 10, or until all redirects have been added.

The other language Wikipedias do get a button in their toolbox that does this, but those would work via enwp.org interlanguage links.Joeytje50 (talk) 21:19, 29 December 2013 (UTC)

Pro tem / prototype
I have implemented a prototype scheme that does not address the privacy concerns, but I can implement (or guide implementation of) such a version for WMF servers, provided some guidance of the complete requirements.

To use this prototype implementation, the steps are:
 * 1) Create a Bitly short URL at https://bitly.com/, yielding bit.ly/{HASHCODE}
 * 2) Use URL wmfsl.org/.{HASHCODE}

Notice the required dot prefix on the hash for this pro tem version, added to avoid hashtag collisions with alternate implementations. (The thought is that these pro tem hashtags would be carried forward after formal rollout.)

Example:
 * Long URL: http://www.mediawiki.org/wiki/Requests_for_comment/URL_shortener
 * Short Bitly URL: http://bit.ly/1kOOHqn
 * Short WMF URL: http://wmfsl.org/.1kOOHqn


 * NOTE 1: This uses the backend Bitly API. Consequently, the normal Bitly statistics/click counts do not get updated.
 * NOTE 2: This will only work for targets hosted at domains (or subdomains of) wikimedia.org, mediawiki.org, and wikipedia.org.

What's missing? What's the next step to get this or something else rolling, forrmally? —Danorton (talk) 22:10, 17 February 2014 (UTC)

Merging of two proposals
Just a minor point certainly but I wanted to mention it anyways. Since we are merging two similar proposals as being discussed, I propose the name "URL Shortener Service" or USS for short for this.

The main point from the other proposal I want to put emphasis on is the need for the URL shortener to be compatible with QR code character restrictions for more universal applications of this endeavor. The #Wiki identifier (2): Map wiki-id and accompanying WikiMap on this proposal explain the rationale behind this idea.

Might I also suggest including the URL shortener output to be present in the rendered page code perhaps as a meta tag. This way apps etc would have the short URL code without sending additional queries.

-- とある白い猫 chi? 17:05, 10 April 2014 (UTC)

Current status
Yuvi said that this changeset "is the initial implementation of the ShortURL RfC, have clear steps forward. am getting plenty of review from legoktm and MaxSem :) I'll setup a test instance on labs soon". Sharihareswara (WMF) (talk) 17:39, 12 June 2014 (UTC)
 * Wanted to drop this here in case having another implementation to look at is helpful: https://github.com/praekelt/url-shortening-service/ Sharihareswara (WMF) (talk) 18:39, 7 July 2014 (UTC)
 * Also, https://gerrit.wikimedia.org/r/#/c/139054/ is merged. Sharihareswara (WMF) (talk) 18:40, 7 July 2014 (UTC)