User:AGreen (WMF)/Draft:Mapping of EventLogging properties to FundraisingImpressions database
This page sets out how data will be input into the pgehres database by new scripts that will replace DjangoBannerStats.
Tables to deprecate[edit]
Table name | Summary | No e-com queries | No other uses reported |
---|---|---|---|
landingpageimpressions | Not populated by existing scripts, table is empty. | ||
globalcollect_orderids | Table doesn't exist on production, only in create.sql. | ||
globalcollect_orderids_unused | Table doesn't exist on production, only in create.sql. | ||
squidhost | Lookup table of Varnish (formerly squid) hosts. Referenced by landingpageimpression_raw and bannerimpression_raw, but those columns are not used in queries. | ||
squidrecord | Not populated by existing scripts, table is empty. |
Table for tracking file ingestion, to update[edit]
In legacy, the squidlog
table records which files that have been ingressed. We'll rename it to files
, and for backwards compatibility, create a view.
Also, we'll add several columns:
New column | Description |
---|---|
status | Enum column for the file's processing status. Can be processing or consumed . (Unprocessed files are not included in the table.)
|
consumed_events | Total events consumed from the file. |
ignored_events | Total events ignored in the file (i.e., read and validated but not stored). |
invalid_events | Total invalid events (or lines) in the file. |
Tables to maintain[edit]
Table | Description |
---|---|
language | Lookup table for languages. |
country | Lookup table for countries. |
project | Lookup table for projects. |
bannerimpressions | Aggregates CentralNotice banner impressions on specific dimensions. |
landingpageimpression_raw | Individual landing page pageview events (only certain properties). |
donatewiki_unique | Individual landing page pageview events (only certain properties). |
Mapping from LandingPageImpression events[edit]
Legacy could process events from two sites, wikimediafoundation.org and and donate.wikimedia.org, but only put events from the second site in donatewiki_unique.
Since effectively we're no longer receiving events from the first site, the new script will check all LandingPageImpression events for 'donatewiki'
in the wiki
property.
Input | Source | Validation | Transformation | DB location |
---|---|---|---|---|
dt | event metadata | Parseable by datetime.strptime() using configured format.
|
Parsed from JSON and provided to mysql.connector as datetime.datetime object.
|
landingpageimpression_raw.timestamp,
donatewiki_unique.timestamp |
country | event | ^[a-zA-Z]{2,2}$
|
Reference in lookup table. | landingpageimpression_raw.country_id |
language | event | ^[a-z]+([_-][a-zA-Z0-9]+)?$
|
Reference in lookup table. Default to 'en' if value is missing or is an empty string. | landingpageimpression_raw.language_id |
utm_source | event | Default validation regex (from config) | landingpageimpression_raw.utm_source,
donatewiki_unique.utm_source | |
utm_campaign | event | Default validation regex (from config) | landingpageimpression_raw.utm_campaign,
donatewiki_unique.utm_campaign | |
utm_medium | event | Default validation regex (from config) | landingpageimpression_raw.utm_medium | |
utm_key | event | Default validation regex (from config) | landingpageimpression_raw.utm_key | |
contact_id | event | Default validation regex (from config) | donatewiki_unique.contact_id | |
link_id | event | Default validation regex (from config) | donatewiki_unique.link_id | |
template | event | Default validation regex (from config) | Use only if landingpage is 'Special:LandingPage''. In that case, join all event properties with '~'; for any that are absent, use the string 'default'. | landingpageimpression_raw.landingpage |
appeal | event | Default validation regex (from config) | landingpageimpression_raw.landingpage | |
appeal_template | event | Default validation regex (from config) | landingpageimpression_raw.landingpage | |
form_template | event | Default validation regex (from config) | landingpageimpression_raw.landingpage | |
form_countryspecific | event | Default validation regex (from config) | landingpageimpression_raw.landingpage | |
landingpage | event | Default validation regex (from config) | If this value is not 'Special:LandingPage', remove the last two parts separated by ('/'); otherwise, ignore. | landingpageimpression_raw.landingpage |
sample_rate | event | (none) | ||
(none) | landingpageimpression_raw.squid_id,
landingpageimpression_raw.squid_sequence | |||
(none) | landingpageimpression_raw.processed |
Mapping from CentralNoticeImpression events[edit]
Input | Source | Validation | Transformation | Output | Notes |
---|---|---|---|---|---|
dt | event metadata | Parseable by datetime.strptime() using configured format.
|
Parsed from JSON and provided to mysql.connector as datetime.datetime object.
|
bannerimpressions.timestamp | |
banner | event | ^[A-Za-z0-9_]+$
|
bannerimpressions.banner | ||
campaign | event | Default validation regex (from config) | bannerimpressions.campaign | ||
project | event | ^[a-z0-9\-_\.]+$
|
Reference in lookup table. | bannerimpressions.project_id | |
uselang | event | ^[a-z]+([_-][a-zA-Z0-9]+)?$
|
Only certain languages processed separately, following this legacy setting. Remaining languages are aggregated. Reference in lookup table. | bannerimpressions.language_id | |
country | event | ^[a-zA-Z]{2,2}$
|
Reference in lookup table. | bannerimpressions.country_id | |
impressionEventSampleRate | event | (not used, as per legacy) | |||
(aggregate of all of the above) | Total number of hits with the same values for banner, campaign, project, language (see above) and country, received in a given log file, for a given minute, scaled according to combined (server- and client-side) sample rate. | bannerimpressions.count |