Extension:EventLogging/Campaigns

Summary
When a user follows a URL to a page that has ?campaign=someName in it, if someName is a valid campaign identifier then this...
 * 1) this logs a campaign event
 * 2) stores the campaign identifier in their session cookie, so that if they sign up, edit, or take another action within the session, then it's logged

Use cases

 * A. Why is someone creating an account?
 * There are lots of paths into creating an account but we can't tell which is effective.


 * B. Invitation to participate in a project, e.g. PhilippinesOutreach
 * We have a promotion that goes to on-wiki page(s) (not necessarily the Create account form). We'd like to know how many people followed it, and associate any account creations with the campaign.


 * C. Present special material during and after account creation and/or login.
 * If we know someone created an account or logged in to participate in the PhilippinesOutreach, then we can do better than present them the default GettingStarted page, we can give them special messaging.

Sketch of design
implemented an earlier version of this as an addition to extension EventLogging, with lots of TODOs.

On any wiki page request, client-side JS
 * Notices URL parameter ?campaign=someName (apsiw, PhilippinesOutreach2013, fromredlink, fromtutorial, etc.)
 * Logs this a simple event according to the Campaigns schema:
 * campaign someName (campaign's an enum, so must match).
 * user session token (if anonymous) or userId (if logged-in)
 * the namespace and title of the page.
 * (Analytics code could process this to apply usertagging)
 * If the campaign event validated, then store the campaign someName in a mediaWiki.campaign session cookie.

When a user creates an account, the existing ServerSideAccountCreation event will log the mediaWiki.campaign cookie, along with token and new userId, so we can directly relate account creations to the most recent campaign the user clicked.

Open issues
Do we heed the user's "Exclude me from experiments" preference?
 * Yes. Note that it's moot for new users, since they haven't expressed a preference.
 * No, Ori comments "the 'vector-noexperiments' preference has been stretched into a generic 'do not track' preference in the past, but no once accepts it as adequate for that purpose."

Do we attempt to rewrite browser history state to remove ?campaign=someName in the browser's location field?
 * + Stops users from unintentionally propagate campaigns by bookmarking or sharing the landing URL.
 * - maybe we want to know every time anyone goes to a ?campaign=someName URL on the wiki, regardless of how they got the link.
 * Note that when we remove a campaign from the enumeration in m:Schema:Campaigns, it will no longer log events.

Discussion
Names for the project, for the query string, for the cookie
 * For now RoCA (Return of Campaign Awareness), ?campaign, and mediaWiki.campaign. We considered and discarded ?c=foo, ?camp=foo and scamp™.

Why always set a session cookie?
 * It's the easiest way to associate a campaign that lands users on some wiki page with later account creation, and this is the most common and important use case.

Why client-side JS? we could log this on the server.
 * client-side already has code to set user session token. Also bots and people who don't want it to be known that they participated in a campaign have some overlap with people who disable JS.

Is enumerating all the currently valid campaign values in the Schema a good idea?
 * + stops people fabricating Rickroll links that set campaign cookie to troll values
 * + forces people to think about the campaigns they want in advance
 * + lets us invalidate campaigns
 * + [View history] provides a history of campaigns
 * + Adds a campaign gatekeeper, WMF staff can't just add ?campaign=SFMeetupHackers to their URLs.
 * - updating the schema rev in PHP will be constant busy-work
 * any group who cares can add an automated workflow
 * - a lot(?) of churn in the schema, there will be a new Schema table each time.

What about multiple campaigns?
 * If the user follows one link with ?campaign in it and then clicks another link with ?campaign, the second replaces the first in the session cookie; only the most recent is associated with the account creation. This is OK since the thing that encouraged account creation is the "closest" campaign, but it means we can't abuse ?campaign to track all the ways users arrive at the Create account form. We could enhance the cookie to store multiple campaigns, but YAGNI.

Question: should we log campaign cookie contents in case there's already something in it and we overwrite?
 * No, nobody has time to puzzle out this crap!

Question: do we overload this to handle userbucketing for A/B tests?
 * No, we have mw.user.bucket for that.

Question: should the camp cookie be prefixed with "mediaWiki" or the wgDbName (enwiki.camp, dewiki.camp)?
 * Just mediaWiki.campaign

Idea: put this in a general-purpose session cookie with other session info.
 * YAGNI for now.

Issue: If someone lands on login with a campaign and instead creates an account, or vice-versa, might want to pass the campaign in the link to the other form.

Question: do we enable this for mobile?
 * Sure, why not?

Use cases vs. sketch
A. External calls to action that link directly to account creation work great, they just append ?campaign=fundraiserCTA5.

We could append ?campaign=redlink, ?campaign=WikipediaTutorial, etc. to each of the internal links to signup. However, since the campaign session cookie only tracks one campaign, this would overwrite any campaign that brought the user to some page on the site (see Use case B).
 * Maybe the link by which users arrive at the Create account form should be separate from campaign. Each of the links to create an account has a via={searchRedLink,WikipediaTutorial, topNavLink, etc.) query string parameter, and account creation stores this in a hidden field that ServerSideAccountCreation later logs.

B. Add ?campaign=PhilippinesOutreach to the links to the target pages, and anyone who follows them will be logged. For any users that create an account during the session, their m:Schema:ServerSideAccountCreation event will log the campaign.

C. Umm, we can offer a custom on-wiki message and/or modify GettingStarted to present something different based on campaign session cookie. Steven Walling cautions that reshaping experience based on campaign by redirecting the user and manipulating pages has NOT worked well in the past (see Account Creation Improvement Project).

Do we want to enforce valid campaigns by enumerating them in the schema?

 * + simpler not to
 * + avoids deployment problems
 * +/- leads to free for all
 * - no registry of valid campaigns (but a registry wouldn't stop someone misusing or copy-pasting the URL of a valid campaign somewhere else)
 * - some risk of someone in a session clicking on an old link that has an invalid ?campaign in it that usurps a valid campaign

Decision: don't validate campaign, any string will log an event and set the cookie. S will probably enforce a maximum length of 30

Do this everywhere or just on account creation?
Ori-l: why do this everywhere? Why not target just AccountCreation where we know what we're doing and we have clear use cases?
 * means we can't point Archaeologists to the Archaelogy project landing page?campaign=Arch2013. We'd have to add ?campaign= to each of the links to create an account on the landing page.


 * S: Sure, it's doable, back to ACUX. It remains a general-purpose feature but we limit its scope by only checking for ?campaign=FINTO on account creation pages.
 * DarTar points out it's not the same as what we did for ACUX because we aren't remembering campaign permanently in userbuckets.

Separate parameter (?casrc) for account creation?
We're sticking with one campaign at a time. The most likely conflict will be a campaign to bring people to the site getting usurped by instrumentation tracking how people wind up creating an account. Why not plan for this from the start by separating "came into site, anywhere, with a ?campaign= URL" from "arrived at account creation with a ?casrc=redlink parameter"?
 * No, stick with ?campaign= only on account creation events. When and if we broaden then we can split them off.

Ori-l: don't set a campaign session cookie?

 * + If we only log campaigns on the Create account URL, then we could drop the campagin cookie and instead add a hidden form field to Create account.
 * - DarTar: no, we need a session cookie in case you leave account creation and come back.

Decision: DarTar seems dispositive, so we'll continue to set a DBName.campaign session cookie.

Remove the ?campaign from the URL

 * Ori says yes.
 * StevenW says sure, it's reduces false positives where someone shares the URL with others.
 * Matt says it might be useful to know someone shared a campaign URL, but then a campaign event doesn't mean "User clicked on a URL with a campaign" but "User somehow got hold of a URL with a campaign in it."
 * DarTar: depending on legal, maybe shouldn't remove it. Example is someone uses a URL shortener like bitly which obscures ?campaign=FundRaiser in a URL, they come to our wiki, we remove ?campaign=FundRaiser, so user doesn't know we're tracking.

JavaScript or PHP
Ori and Matt still in favor of PHP solution. We could still remove ?campaign from the URL by issuing a 301 redirect.