Analytics/Wikistats/Database API

= Proposed Reportcard API=

This document outlines the new API for the Reportcard. Please leave your thoughts at the Talk page. This proposal is incomplete.

Reportcard API
This document outlines the new API for the Reportcard. Please leave your thoughts at the Talk page.

Old API call for data analysis metrics

 * concept documentation for new API call

* action=analytics * Collect data from the analytics database. Parameters metric      - Type of data to collect. About metric names: these include source of data, to allow for alternate sources of similar metrics, which likely are defined differently or have other intrinsic issues (e.g. precision/reliability). One value: comscoreuniquevisitors definition: Unique persons that visited one of the Wikimedia wikis at least once in a certain month filters: selectregions, selectcountries [implementation: table comscore, field visitors] comscorereachpercentage definition: Percentage of total unique visitors to any web property which also visited a Wikimedia wiki filters: selectregions, selectcountries [implementation: table comscore, field reach] squidpageviews definition: Total articles (htm component) requested from nearly all Wikimedia wikis (exceptions are mostly special purpose wikis, e.g. wikimania wikis) Totals are based on the archived 1:1000 sampled squid logs. filters: selectregions, selectcountries, selectwebproperties, selectprojects, selectwikis, selectplatform [implementation: table page_views, field views_non_mobile_raw,views_mobile_raw,views_non_mobile_normalized,views_mobile_normalized depending on normalized and select_platform] dumparticlecount definition: All namespace 0 pages which contain an internal link minus redirect pages (for some projects extra namespaces qualify) filters: selectprojects, selectwikis [implementation: table comscore, field reach] dumpbinarycount definition: All binary files (nearly all of which are multimedia files) available for download/article inclusion on a wiki filters: selectprojects, selectwikis [implementation: table, field ] dumpedits definition: All edits on articles (as defined by dumparticlecount) filters: selectprojects, selectwikis [implementation: table wikistats, field edits] dumpnewregisterededitors definition: All registered editors that in a certain month for the first time crossed the threshold of 10 edits since signing up                     filters: selectprojects, selectwikis [implementation: table wikistats, field editors_new] dumpactiveeditors5 definition: All registered editors that made 5 or more edits in a certain month filters: selectprojects, selectwikis [implementation: table wikistats, field editors_ge_5] dumpactiveeditors100 definition: All registered editors that made 100 or more edits in a certain month filters: selectprojects, selectwikis [implementation: table wikistats, field editors_ge_100] estimatereadersoffline definition: People who access Wikipedia through an offline reader [implementation: table offline, field readers] other metrics which are likely to follow at some stage (for now included for brainstorm purposes only) squidpageedits worldbankpopulationpercountry worldbankinternetuserspercountry Parameter is always required startmonth     - First month to include in time series, or single date month to include One value: single month as yyyy-mm-dd Parameter is always required endmonth     - Last month to include in time series One value: single month as yyyy-mm-dd select... - Return data per month per qualifying row of data Specify per select parameters the criteria in any of four ways (only cB and cC can be combined): cA: * for all known values, e.g. selectregions=* cB: one or more codes separated by pipe. e.g. selectregions=NA|SA cC: one or more codes separated by plus sign, which returns required data totalled for all specified codes, e.g. selectregions=NA+SA cD: highest n (number) occurences, using values for most recent selected month for ranking, e.g. selectcountries=top:12 Available select.. parameters: selectregions cA cB cC                    for valid region codes see here selectcountries cB cC cD                    for valid country codes see here selectwebproperties cC cD                    This parameter requires extra authorisation Example: selectwebproperties=top:10 selectprojects cC                    for valid project codes see here selectwikis cC                    specify each wiki code as project:language, e.g. wp:en for English Wikipedia, wq:de for German Wikiquote Example: selectwikis=wp:en|wp:de selecteditors cB cC                      A for anonymous user, R for registered user, B for bot Example: selecteditors=R|A|R+A|B selectedits cB cC                      M for manual, B for bot-induced Example: selectedits=M|B selectplatform cB cC (only squidpageviews) M for mobile N for non-mobile (anyone knows a better term?) Example: selectplatform=M|N|M+N normalized  - Y or N                  Only applies to squidpageviews, where data for each month are recalculated to 30 days (other metrics may follow) Default: N (WMF Report Card will use normalized time series when available) data       - One or more type of data to be returned, separated by comma Values: timeseries returns ordered list of value pairs, on efor each month within range timeseriesindexed like timeseries, but each month's value will be relative to oldest month's value which is always 100 percentagegrowthlastmonth percentagegrowthlastyear, percentagegrowthfullperiod growth percentages are relative to oldest value (80->100=25%) although trivial, requesting these metrics through API ensures all clients use same calculation Default: timeseries reportlanguage - Language code, used to expand region and country codes into region and country name Default: en                 Supported: en     format       - (csv,json,... see elsewhere) . Examples: api.php?action=analytics&months=2008-03:2011-03&metric=squidpageviews&selectcountries=US|UK&selectmobile=M|N&normalized&data=timeseries|percentagegrowthlastmonth|percentagegrowthlastyear&format=xml returns four sets of metrics (time series plus two percentages) one for United States/mobile, one for United States/non-mobile, one for United Kingdom/mobile, one for United Kingdom/non-mobile

Filter select_regions
For comscore_... filters: AS = Asia Pacific C = China EU = Europe I = India LA = Latin-America MA = Middle-East/Africa NA = North-America US = United States W = World

Filter select_countries
Valid country codes are ISO 3166-1
 * List of ISO 3166-1 country codes and English name
 * About 3166-1 standard

Filter select_projects
wb = Wikibooks wk = Wiktionary wn = Wikinews wp = Wikipedia wq = Wikiquote ws = Wikisource wv = Wikiversity co = Commons wx = Other projects

Filter select_wikis
Specify as project:language For valid project codes see select_projects above For full lists of valid language codes per project see Wikimedia projects The following overview of exported wiki databases (aka dumps) can also be useful: lists of Wikimedia dumps

Return value
The return value will have the outermost group be the name of the metric, along with the API call used to generate this object, the start date of any time series returned, along with the granularity. The objects inside the main metric will have all the appropriate filters that were used to obtain the data, along with any constraints on the data.