Analytics/Wikistats/Database API

New API calls for data analysis metrics

 * concept documentation for new API call

* action=analytics * Collect data from the analytics database. Parameters metric      - Type of data to collect. About metric names: these include source of data, to allow for alternate sources of similar metrics, which likely are defined differently or have other intrinsic issues (e.g. precision/reliability). One value: comscore_unique_visitors definition: Unique persons that visit one of the Mediawiki wikis at least once in a certain month comscore_reach_percentage definition: squid_pageviews definition: dump_article_count definition: dump_binary_count definition: dump_edits definition: dump_new_registered_editors definition: dump_active_editors_5 definition: dump_active_editors_100 definition: Parameter is always required months     - First and last month to include in time series One value: single month as yyyy-mm month range as yyyy-mm;yyyy-mm Parameter is always required select_... - Return data per month per qualifying row of data Specify per select parameters the criteria in any of four ways (only cB and cC can be combined): cA: * for all known values, e.g. select_regions=* cB: one or more codes separated by comma. e.g. select_regions=NA,SA cC: one or more codes separated by plus sign, which returns required data totalled for all specified codes, e.g. select_regions=NA+SA cD: means return required data for highest n (number) occurences, using values from most recent month for ranking, e.g. select_countries=top:12 Available select_.. parameters: select_regions cA cB cC                    for valid region codes for comscore_.. metrics see .. (?)                  select_countries cB cC cD                     for valid country codes see .. (?)                  select_web_properties cC cD                     This parameter requires extra authorisation select_projects cC                    for valid project codes see .. (?)                  select_wikis cC                     specify each wiki code as project:language, e.g. wp:en for English Wikipedia, wq:de for German Wikiquote Example: select_wikis=wp:en,wp:de select_editors cB cC                      A for anonymous, R for registered Example: select_editors=R,A,R+A select_platform cB cC (only squid_page_views) M for mobile N for non-mobile (anyone knows a better term?) Example: select_platform=M,N,M+N normalized  - Y or N                  Only applies to squid_page_views (other metrics may follow) Default: N (WMF Report Card will use normalized time series when available) data       - One or more type of data to be returned, separated by comma Values: time_series returns ordered list of value pairs, on efor each month within range time_series_indexed like time_series, but each month's value will be relative to oldest month's value which is always 100 percentage_growth_last_month percentage_growth_last_year, percentage_growth_full_period growth percentages are relative to oldest value (80->100=25%) although trivial, requesting these metrics through API ensures all clients use same calculation Default: time_series lang        - Language code, used for region and country names Default: en                 Supported: en     format       - (csv,json,... see elsewhere) . Examples: api.php?action=analytics&months=2008-03:2011-03&metric=squid_page_views&select_countries=US,UK&select_mobile=M,N&normalized=Y&data=time_series,percentage_growth_last_month,percentage_growth_last_year,format=xml returns four sets of metrics (time series plus two percentages) one for United States/mobile, one for United States/non-mobile, one for United Kingdom/mobile, one for United Kingdom/non-mobile


 * API calls (prototyping phase)

action=analytics &metric=[comscore_unique_visitors|comscore_reach_percentage] &month_range=yyyy-mm;yyyy-mm &filter=[ region_code=[aa,bb,..|*] | country_code=[aa,bb,..|*|top:10] | web_properties=top:10 ] &data=[time_series,time_series_cumulative,percentage_growth_last_month,percentage_growth_last_year,percentage_growth_series] [&lang=en] &format=[csv,json,text,...] &modality=[absolute|indexed]

action=analytics &metric=squid_pageviews &month_range=yyyy-mm;yyyy-mm &filter=[ (none | ?) region_code=[aa,bb,..|*] | country_code=[aa,bb,..|*|top:10] ] | project_codes=[wb,wn,wk,,...|*] (| wiki_codes=wp:en,wp:de,..|top:10)] &data=[time_series,percentage_growth_last_month,percentage_growth_last_year,percentage_growth_series] &mobile=[Y|N|*] &normalized=[Y|N] [&lang=en] &returntype=[csv,json,text,...] &modality=[absolute|indexed]

action=analytics &metric=[dump_article_count|dump_binary_count|dump_edits|dump_new_registered_editors|dump_active_editors_[5,100]] &month_range=yyyy-mm;yyyy-mm &filter=[project_codes=[wb,wn,wk,,...|*] | wiki_codes=wp:en,wp:de,..|top:10] &data=[time_series(,percentage_growth_last_month,percentage_growth_last_year,percentage_growth_series(?))] &normalized=[Y|N] &edits=[[un]registered|*] [&lang=en] &returntype=[csv,json,text,...] &modality=[absolute|indexed]


 * Notes
 * unique_visitors rather than visitors as comscore has variations that we might add later
 * concise syntax for month range as we'll use that a lot
 * * = all
 * lang=en default
 * is modality the right word? scale?
 * returned result will contain ordered (?) array of arrays containing region_code,region name,ordered set of yyyymm,value pairs (as it does now)
 * modality=absolute also used for log scale (call it modality rather than scale?, scale is rather linear/logarithmic)
 * only time_series??
 * metrics=squid_pageviews: also 'language_codes=en,de,..' meaning data for all projects (wiktionary, etc) for that language ? (will we actually use this?)
 * 'filter=none' to make sure requester really want have
 * verbose(?) or always return exact call issued, maybe even definition of metric from 'definitions' table